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This  study  presents  new  approaches  for  practical  problems  related  to  using  crop 
models  in  precision  agriculture.  Agriculture  is  becoming  increasingly  competitive  and 
regulated.  Farmers  must  maximize  profits  yet  decrease  their  farms'  environmental 
impact.  Precision  agriculture  has  been  proposed  as  a  way  to  improve  farmers'  income  and 
minimize  the  environmental  impact  of  farming  by  optimizing  the  applied  levels  of 
fertilizers  and  other  crop  inputs  on  a  site-specific  basis.  However,  for  spatially  variable 
prescriptions  to  be  effective,  farmers  need  to  thoroughly  understand  how  several 
interacting  physical  and  biological  factors  contribute  to  cause  spatial  yield  variability. 

Crop  simulation  models  are  software  programs  that  imitate  plant  growth  and 
development.  They  can  help  us  understand  spatial  yield  variability  and  how  to  manage  it. 
However,  crop  models  have  expensive  and  impractical  soil  data  requirements,  especially 
for  spatial  applications.  A  technique  called  inverse  modeling  uses  the  crop  models 
themselves  to  search  for  the  model  parameters  that  best  fit  observed  results.  This 
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technique  is  very  convenient  for  practical  applications  in  precision  agriculture,  but  its 
current  state  of  development  does  not  ensure  good  predictive  power. 
Our  objectives  were 

•  To  identify  and  quantitatively  compare  different  sources  of  error  in  the  use  of 
inverse  modeling  to  parameterize  spatially  coupled  and  uncoupled  crop  models. 

•  To  develop  methods  for  optimizing  spatial  sampling  schemes  for  representing  the 
spatiotemporal  variability  of  yield  and  yield-limiting  factors. 

•  To  develop  and  evaluate  a  portable  framework  for  eliciting  knowledge  from 
experts  using  that  knowledge  to  parameterize  a  spatial  crop  model. 

We  found  that  crop  yield  spatiotemporal  variability  in  a  field  can  be  represented 

using  a  limited  number  of  sampling  locations;  that  those  locations  can  be  found  using 

efficient  combinatorial  optimization  algorithms;  and  that  in  many  applications  crop 

model  results  in  the  sampling  locations  can  be  kept  within  acceptable  error  levels  without 

needing  the  computationally  intensive  coupling  (i.e.,  interchange  of  water)  between 

simulation  locations.  This  can  be  facilitated  by  imposing  a  set  of  spatial  constraints  on  the 

system  during  the  inverse  modeling  process.  The  constraints  can  be  elicited  from  local 

domain  experts. 


xxii 


CHAPTER  1 
INTRODUCTION 

Precision  Agriculture  and  Crop  Models 

Present-day  row  crop  (maize,  soybeans,  etc.)  agriculture  is  beset  by  economic  and 
environmental  problems.  Commodity  prices  decreased  steadily  through  the  20th  Century 
(USDA  NASS,  1994),  increasing  the  economic  risk  of  agricultural  production.  Moreover, 
growing  levels  of  environmental  regulatory  pressure  have  limited  farmers'  ability  to 
manage  risk.  The  limitations  include  water  body  contamination  limits  and  total  maximum 
daily  loads  (EPA,  2003),  market  limitations  on  the  use  of  genetically  modified  organisms, 
and  competition  between  agricultural  and  urban  water  use. 

Precision,  or  site-specific,  agriculture  has  been  proposed  as  a  way  for  improving 
farmers'  income  and  reducing  the  environmental  impact  of  agriculture  by  optimizing  the 
applied  doses  of  fertilizers  and  other  inputs  on  a  site-specific  basis  (NRC,  1 997). 
Precision  agriculture  merges  several  enabling  technologies  such  as  global  positioning 
systems  (GPS),  geographical  information  systems  (GIS),  and  real-time  variable-rate 
application  technology  (Morgan  and  Ess,  1997).  Site-specific  management  requires 
equipment  that  can  vary  application  rates  in  real  time  while  moving  through  the  field 
(Anderson  and  Humburg,  1997).  Another  necessary  component  for  variable-rate 
application  is  the  spatially-variable  prescription  (i.e.,  site-specific  dosage  of  crop  inputs) 
(Morgan  and  Ess,  1997). 

Effective  prescriptions  require  a  thorough  understanding  of  the  causes  of  spatial 
yield  variability,  as  well  as  objective  methods  for  predicting  crop  yield  responses  to 
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changes  in  specific  inputs.  In  the  case  of  variable-rate  application  of  fertilizers, 
prescriptions  are  usually  driven  by  yield  goals  and  soil  test  results,  and  are  generally 
based  on  crop  and  soil  nutrient  budgeting  (Hergert  et  al.,  1997).  However  the  response 
functions  that  are  used  to  make  recommendations  from  these  inputs  are  frequently  based 
on  soil  test  results  originally  aggregated  over  a  whole  field,  and  often  organized  at  the 
state  or  regional  level  (Lowenberg-DeBoer  and  Swinton,  1997).  Additionally,  they  may 
possibly  be  biased  toward  over-application,  due  to  assumptions  made  regarding  farmers' 
preferences  (Hergert  et  al.,  1997).  This  may  lead  to  unexpected  responses  of  the  crop  to 
variable-rate  application,  with  the  consequent  reduction  in  the  perceived  value  of  site- 
specific  variable-rate  technology. 

Precision  agriculture  has,  so  far,  led  less  to  the  growth  in  farm  decisionmakers' 
understanding  of  the  causes  of  spatial  yield  variability  than  it  has  led  to  the  development 
of  machinery  to  monitor  spatial  yield  variability  and  apply  prescriptions.  There  have  been 
many  advances  in  techniques  for  measuring  and  representing  spatial  yield  variability  for 
many  crops  (Pierce  et  al.,  1997).  Yield  monitors  have  been  developed  using  real-time 
direct  volume  methods  (Borgelt,  1993;  Searcy  et  al.,  1989);  real-time  direct  weighing 
methods  (Schrock  et  al.,  1995;  Wagner  and  Schrock,  1989);  and  indirect,  pressure-plate 
methods  (Birrell  et  al.,  1996).  Their  accuracy  has  also  been  studied,  both  in  laboratory 
and  field  settings  (Al-Mahasneh  and  Colvin,  2000;  Arslan  and  Colvin,  1999).  Yield 
monitors  and  yield  mapping  methods  have  been  developed  for  many  crops,  such  as  barley 
(Stafford  et  al.,  1996;);  maize  (Perez  Munoz  and  Colvin,  1996;  Pfeiffer  et  al.,  1993); 
peanuts  (Boydell  et  al.,  1995);  potato  (Campbell  et  al.,  1994;  Rawlins  et  al.,  1995); 
soybean  and  maize  (Jaynes  and  Colvin,  1997);  sugarbeets  (Hofman  et  al.,  1995);  and 
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wheat  (Miller  et  al.,  1988).  There  is  also  a  large  body  of  research  on  variable-rate 
technology  (Anderson  and  Humburg,  1997),  including  the  development  of  appropriate 
spray  nozzles  (Miller  and  Smith,  1992);  spinner-disc  fertilizer  applicators  (Fulton  et  al., 
2001);  the  special  case  of  center  pivots  (Camp  and  Sadler,  1998);  control  requirements 
(Paice  et  al.,  1996);  and  accuracy  analysis  (Goense,  1997;  Way  et  al.,  1992;  Weber  et  al., 
1993). 

Many  researchers  have  sought  to  understand  spatially  variable  yield-limiting 
factors  of  major  crops  using  statistical  regression  analysis.  Several  of  these  studies 
focused  on  maize  (Braga,  2000;  Everett  and  Pierce,  1996;  Mallarino  et  al.,  1996;  Tomer 
et  al.,  1995);  maize  and  soybeans  (Khakural  et  al.,  1996;  Cambardella  et  al.,  1996; 
Kessler  and  Lowenberg-DeBoer,  1998;  Sudduth  et  al.,  1996).  However,  when  these 
studies  correlated  yield  with  soil  properties  or  terrain  attributes,  either  they  described  a 
very  limited  fraction  of  yield  variability  (Everett  and  Pierce,  1996;  Kessler  and 
Lowenberg-DeBoer,  1998;  Mallarino  et  al.  1996;  Sudduth  et  al.,  1996),  or  the 
relationships  were  not  consistent  across  different  years  (Braga,  2000;  Tomer  et  al.,  1995). 

These  results  suggest  that  purely  statistical  approaches  may  have  descriptive  value, 
but  are  not  appropriate  for  predictive  purposes.  Interannual  variability  of  weather,  the 
spatial  variability  of  soil  properties,  and  the  landscape-position-dependence  of  other 
processes  create  a  dynamic  environment  for  plant  growth.  The  problem  of  separating  the 
effects  of  different  environmental  factors  using  statistical  techniques  is  especially 
complex  because  crop  yield  in  a  field  is  controlled  by  numerous  concurrent  factors. 
Furthermore,  plant  susceptibility  to  a  particular  factor  may  depend  on  the  crop's 
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developmental  stage  and  on  other  environmental  conditions,  and  in  field  experiments  the 
number  of  variables  can  easily  exceed  the  number  of  available  data. 

Crop  simulation  models  have  been  used  as  a  powerful  analytic  tool  to  understand 
environmental  influences  on  crop  yield.  They  provide  the  unique  opportunity  to  account 
for  many  interacting  yield-influencing  factors  in  ways  that  are  impossible  with  traditional 
agronomic  experimentation.  Crop  models  have  often  been  used  to  analyze  the  causes  of 
temporal,  weather-  and  climate-related  yield  variability  (Boote  et  al.,  1996;  Ferreyra  et 
al.,  2001;  Parry  et  al.,  1999),  and  have  recently  also  been  used  for  understanding  spatial 
yield  variability  (Batchelor  et  al.,  2002;  Braga,  2000;  Irmak  et  al.,  2001;  Paz  and 
Batchelor,  2000;  Paz  et  al.,  1998;  Sadler  et  al.,  2000).  Crop  simulation  models'  ability  to 
reproduce  both  temporal  and  spatial  crop  yield  variability  suggests  that  they  may  be  ideal 
tools  for  diagnostic  and  prescriptive  use  in  precision  agriculture. 

However,  the  quality  of  crop-model-based  diagnostic  analyses  and  prescriptions 
depends  on  the  accuracy  with  which  the  models'  parameters,  or  values  that  represent 
characteristics  of  the  model  and  remain  constant  throughout  a  simulation  (Jones  and 
Luyten,  1998),  are  determined.  Measuring  soil  water-holding  parameters  is  time- 
consuming  (Klute,  1986)  and  expensive  (a  typical  value  is  U$S  20  for  testing  one  soil 
water  holding  limit  on  one  soil  sample;  see  A&L  Labs,  2003),  but  given  accurate  field 
measurements,  crop  simulation  results  can  capture  variability  well,  as  shown  by  Braga 
(2000)  with  the  CERES-Maize  model  (Ritchie  et  al.,  1998).  Conversely,  if  parameter 
values  are  taken  from  coarse  estimates  such  as  soil  survey  data,  the  model  may  perform 
poorly,  as  found  by  Sadler  et  al.  (2000)  using  the  same  model. 


The  enormous  cost  of  sampling  soil  hydraulic  parameters  at  an  adequate  spatial 
density  for  using  crop  models  in  precision  agriculture  (henceforth,  we  will  refer  to  crop 
models  used  for  simulating  spatiotemporal  variability  as  spatial  crop  models)  motivated 
the  search  for  alternatives  for  estimating  the  parameters  instead  of  measuring  them. 
Inverse  modeling  (IM)  is  an  estimation  method  suitable  for  use  with  crop  models.  It  uses 
the  model  itself  and  a  search  algorithm  to  propose  parameter  values.  Welch  et  al.  (1999a) 
used  an  IM-based  method  for  estimating  crop  model  genetic  coefficients.  The  method 
exhaustively  simulated  all  the  parameter  combinations  in  a  discrete  input  space  (the 
parameter  space),  and  then  examined  the  results  to  find  the  best  parameter  combination 
(or  parameter  set)  for  each  crop  variety.  The  "best"  parameter  set  was  defined  as  the  one 
producing  the  lowest  value  of  an  objective  function:  the  sum  of  squared  residuals 
between  simulated  and  observed  data.  Irmak  et  al.  (2001)  expanded  the  grid  search 
concept  for  estimating  soil  parameters. 

Using  inverse  modeling  to  parameterize  a  crop  model  is  not  without  problems. 
When  the  observed  crop  yield  has  been  affected  by  a  factor  not  considered  by  the  crop 
model,  the  inverse  modeling  algorithm  will  attempt  to  explain  the  effect  using  soil 
properties.  Most  crop  models  do  not  account  for  yield-limiting  factors  such  as  pests, 
weeds,  diseases,  nutrients,  and  extreme  pH.  Thus,  attributing  yield  losses  by  using  IM  to 
match  predicted  and  observed  yields  may  yield  incorrect  parameter  values. 

Extending  the  models  to  simulate  the  effects  of  additional  factors  is  possible— 
Fallick  et  al.  (2002),  Paz  et  al.  (2001),  and  Irmak  et  al.  (2002)  extended  crop  models  to 
include  soil  pH,  soybean  cyst  nematodes,  and  weed  effects — but  in  most  practical 
applications  quantitative  knowledge  about  the  effect  of  extraneous  factors  on  yield  is 
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imperfect  (thus  reducing  confidence  in  the  model  results).  Additionally,  quantifying  the 
factor  itself  (e.g.,  site-specific  degree  of  nematode  infestation)  may  be  difficult  or 
impractical.  Uncertainty  in  observed  yield,  due  to  extraneous  factors  or  yield  monitor 
measurement  errors  (Lark  et  al.,  1997;  Whelan  and  McBratney,  1997)  thus  implies  the 
possibility  of  error  in  the  parameter  estimates,  with  the  consequent  degradation  in  the 
quality  of  the  model's  predictions. 

Moreover,  to  date  there  have  been  few  efforts  at  using  crop  simulation  models  for 
describing  how  the  spatial  variability  of  soil  water  content  influences  crop  yield. 
Considering  that  drought-related  stresses  typically  limit  crop  growth  in  the  world's  major 
crop-producing  regions,  spatial  water  distribution  is  an  important  consideration  when 
applying  precision  agriculture  in  those  regions. 

Paz  and  Batchelor  (2000)  used  the  CROPGRO  model  (Boote  et  al.,  1998)  to 
analyze  spatial  soybean  yield  variability  in  a  field,  attributing  it  to  the  spatially  variable 
effects  of  plant  density,  weeds,  soybean  cyst  nematodes,  and  water  stress.  Previously,  Paz 
et  al.  (1998)  explained  the  influence  of  water  stress  in  terms  of  rooting  depth  (loosely 
equivalent  to  a  soil  water  holding  capacity  parameter),  and  a  drainage  parameter,  either 
the  saturated  hydraulic  conductivity  (KSAT)  or  a  soil  drainage  rate  coefficient  (SLDR), 
both  of  which  control  the  rate  at  which  saturated  soil  drains  fully  down  to  its  drained 
upper  limit.  These  authors  noted  that  their  modeling  scheme's  ability  to  reproduce 
observed  data  degraded  in  low-lying  areas,  possibly  due  to  the  model's  inability  to 
account  for  run-on  or  sub-surface  flow  from  neighboring  areas.  Indeed,  given  that  these 
processes  were  not  considered,  the  parameterization  procedure  tried  to  explain  excess 
water  across  the  field  in  terms  of  slow  drainage  or  increased  rooting  depth.  This  might 
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affect  the  predictive  capability  of  this  procedure  when  the  model  is  used  in  years  not  used 
for  parameter  calibration. 

If  spatial  crop  models  accounted  for  three-dimensional  water  movement  over  the 
landscape,  they  could  possibly  better  explain  the  causes  of  water-stress-induced  yield 
variability.  This  would  require  the  inclusion  into  the  spatial  crop  model  of  a  three- 
dimensional  water  balance  simulation  across  the  landscape.  Such  a  complete  model  does 
not  currently  exist;  and  its  parameterization  by  inverse  modeling  could  prove  to  be 
computationally  intractable. 

We  will  use  the  term  spatially-coupled  crop  model  to  denote  a  spatial  crop  model 
in  which  landscape  units  interchange  water  and  nutrients.  Each  landscape  unit  (or  cell)  in 
such  a  model  cannot  be  parameterized  independently  using  IM,  because  variation  of  the 
parameters  of  one  cell  could  affect  the  availability  of  water,  and  thus  the  yield,  of  another 
cell.  In  principle,  such  a  model  demands  a  simultaneous  parameterization  of  all  the  cells. 
However,  the  parameter  space  size  (i.e.,  the  number  of  unique  parameter  combinations 
among  which  the  optimum  must  be  found)  would  then  grow  exponentially  with  the 
number  of  cells  into  which  the  field  of  interest  was  divided.  Also,  it  is  not  clear  whether 
the  uncertainty  in  the  determination  of  crop  model  parameters  would  not  be  amplified 
through  a  spatially-coupled  model. 

An  alternative  scheme  is  to  reduce  the  complexity  of  the  problem  by  representing 
the  spatial  variability  of  crop  yield  with  geostatistical  techniques  (Goovaerts,  1997),  and 
applying  a  spatial  interpolation  algorithm  to  the  yield  values  simulated  independently  in  a 
limited  number  of  locations.  The  parameterization  of  an  uncoupled  model  (i.e.,  one  in 
which  the  cells  are  not  spatially  coupled)  would  then  be  modified  to  improve  its  ability  to 


capture  the  behavior  of  the  crop  at  these  locations  of  interest.  To  avoid  the  confounding 
influence  of  the  aforementioned  extraneous  factors,  this  process  would  have  to  rely  on 
additional  information. 

Spatiotemporal  yield  variability  can  be  simulated  accurately  when  the  spatial  and 
temporal  behavior  of  the  primary  yield-limiting  factor  (typically  soil  water)  is  simulated 
properly  (Braga,  2000;  Calmon  et  al.,  1999;  Ferreyra,  1998).  Direct  measurements  of  soil 
water  content  are  not  appropriate  for  practical,  extensionist-  or  consultant-driven 
applications  of  crop  models  in  farm  decision  support  (R.  Murdock,  pers.  comm.);  but 
other  sources  of  knowledge  are  available,  typically  in  the  form  of  expert  opinion. 
Farmers,  extension  agents,  crop  consultants,  and  other  experts  such  as  Natural  Resource 
Conservation  Service  (NRCS)  soil  scientists,  possess  a  wealth  of  knowledge  about 
dominant  behavior  of  different  parts  of  the  field,  including  wetness,  soil  properties,  weed 
pressure,  and  so  on.  Eliciting  this  knowledge  is  relatively  inexpensive,  but  harnessing  it 
in  a  way  that  can  be  valuable  for  parameterizing  a  spatial  crop  simulation  model  is  a 
challenging  endeavor;  it  is  also  the  problem  that  motivates  this  dissertation. 

Goal  and  Objectives 

The  goal  of  this  study  was  to  develop  new  approaches  and  practical  solutions  for 
the  problem  of  spatial  crop  model  parameterization  for  precision  agriculture  applications 
in  which  soil  water  availability  is  the  primary  yield-limiting  factor.  Its  specific  objectives 
are  the  following: 

1 .  To  identify  and  quantitatively  compare  different  sources  of  error  in  the  use  of 
inverse  modeling  to  parameterize  spatially  coupled  and  uncoupled  crop  models. 

2.  To  develop  methods  for  optimizing  spatial  sampling  schemes  for  representing  the 
spatiotemporal  variability  of  yield  and  yield-limiting  factors. 
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3.      To  develop  and  evaluate  a  portable  framework  for  eliciting  knowledge  from  experts 
and  using  that  knowledge  to  parameterize  a  spatial  crop  model. 

Outline  of  the  Dissertation 

This  dissertation  contains  six  components  that  address  the  above  listed  objectives: 
Chapters  2  to  7.  Each  chapter  is  self-contained  and  can  be  read  independently.  Chapter  2 
addresses  the  first  objective;  Chapters  3  and  4  address  the  second;  Chapters  5  to  7  address 
the  third. 

Chapter  2  compares  three  different  sources  of  error  in  the  use  of  inverse  modeling 
to  parameterize  a  spatial  crop  model:  a)  spatially-coupled  vs.  uncoupled  model;  b)  lack  of 
knowledge  about  initial  conditions;  and  c)  biases  in  weather  data  used  for 
parameterization.  We  used  inverse  modeling  to  parameterize  a  simple  spatial  crop  model 
based  on  CROPGRO,  exploring  different  scenarios  built  from  combinations  of  different 
levels  of  the  errors  mentioned  above.  For  the  simulations  we  used  weather  and  soils  data 
from  a  water-limited  environment  in  Cordoba,  Argentina. 

Chapter  3  develops  solutions  to  the  problem  of  concurrently  obtaining  an  optimal 
spatial  sampling  scheme  for  a  phenomenon  of  interest  (e.g.,  yield)  and  an  optimal  closed 
scouting  path  that  links  the  locations  of  the  sampling  scheme.  This  problem  is 
characteristic  of  crop  scouting,  and  is  relevant  for  the  observation  of  both  crop  yield  and 
the  level  of  yield-affecting  factors  in  a  field.  Chapter  3  also  explores  the  problem  in  the 
context  of  minimal  data  requirements. 

Chapter  4  elaborates  on  the  concept  of  spatial  sampling  scheme  optimization, 
applying  it  to  the  soil  water  content  domain.  We  compared  different  algorithms  and 
objective  functions  for  obtaining  the  best  spatiotemporal  predictive  capability  with  a 
given  number  of  locations,  and  explored  the  predictive  limits  of  geostatistical  techniques 
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in  a  landscape  in  which  spatial  water  movement  is  believed  to  occur.  For  this  chapter  we 
used  a  soil  water  dataset  from  Cordoba,  Argentina:  5  dates  of  soil  water  content 
observations  in  57  locations  on  an  isometric  grid  covering  an  8. 3 -hectare 
micro  water  shed. 

Chapter  5  revisits  the  search  algorithm  used  by  Irmak  et  al.  (2001)  for 
inverse-modeling-based  parameterization.  These  authors  compared  an  exhaustive  grid 
search  method  with  the  sophisticated  adaptive  simulated  annealing  algorithm  proposed  by 
Ingber  (1993)  and  used  by  Braga  (2000),  Calmon  et  al.  (1999),  and  Paz  et  al.  (1998)  for 
estimating  crop  model  soil  parameters.  Irmak  et  al.  (2001)  claimed  better  performance 
using  the  former.  We  developed  a  hybrid  between  the  simulated  annealing  algorithm  and 
a  grid  search.  Our  algorithm  is  more  efficient  than  the  grid  search  and  capable  of 
providing  tentative  solutions  that  can  be  progressively  updated. 

Chapter  6  introduces  Bayesian  networks  as  tools  for  representing  knowledge  about 
causal  relationships  and  for  combining  different  sources  of  knowledge  into  a  common 
probabilistic  framework.  It  shows  a  simple  example  of  how  we  used  Bayesian  networks 
technology  to  help  understand  the  causes  of,  and  predict,  spatiotemporal  yield  variability 
in  a  field  in  Kentucky  during  discussions  involving  domain  experts:  the  farmer,  an  NRSC 
soil  scientist,  and  crop  consultants. 

Chapter  7  builds  on  the  preceding  chapters.  We  used  a  two-tiered  approach  to  the 
inverse  modeling  parameterization  problem.  The  lower  tier  is  a  network  of  spatial 
relationships  of  crop  model  inputs  or  outputs  among  the  locations  of  interest,  elicited 
from  interaction  with  domain  experts.  The  top  tier  is  the  (uncoupled)  crop  model,  run  for 
each  location  of  interest.  The  spatially-coupled  bottom  tier  constrains  the  behavior  of  the 
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uncoupled  top  tier;  the  problem  remains  computationally  tractable,  despite  its  large 
parameter  space,  because  evaluating  the  spatial  constraints  is  very  fast  compared  to  a 
single  crop  model  run,  and  because  we  used  the  fast  algorithm  developed  in  Chapter  5. 

In  Chapter  7  we  also  explored  a  case  study  using  the  scheme  mentioned  above  to 
parameterize  and  test  a  spatial  crop  model  based  on  CERES-Maize  in  a  field  in 
Kentucky,  USA.  Available  data  included  real-time  kinematic  GPS-derived  elevation  data, 
three  years  of  corn  yield  maps,  one  year  of  wheat  and  soybean  yield  maps,  SCS  Soil 
Survey  data,  soil  electroconductivity  data,  and  in  situ  soil  probe  observations  and  soil 
water  content  time  series,  coupled  with  expert  opinions. 

Finally,  in  Chapter  7  we  also  applied  the  abovementioned  network  of  spatial 
relationships  to  constrain  the  IM  parameterization  of  a  simple,  spatially-coupled,  CERES- 
Maize-based  crop  model  in  the  same  field  in  Kentucky. 


CHAPTER  2 

SOURCES  OF  ERROR  WHEN  INVERSE  MODELING  IS  APPLIED  TO  THE 
PARAMETERIZATION  OF  SPATIALLY-COUPLED  CROP  MODELS 

Introduction 

Agricultural  production  of  crops  such  as  corn,  soybeans,  and  wheat  currently  poses 
economic  and  environmental  problems.  Inflation-adjusted  agricultural  commodity  prices 
decreased  steadily  through  the  20th  Century  (USDA  NASS,  1994).  This  contributed  to 
make  agricultural  production  less  profitable  and  more  risky.  Moreover,  increasing  levels 
of  environmental  regulatory  pressure  limit  farmers'  risk-mitigating  management  options. 
These  limits  are  manifested  through  groundwater-quality-driven  limits  on  fertilizer  and 
pesticide  applications,  through  market  limitations  on  the  use  of  genetically  modified 
organisms,  and  through  competition  between  agricultural  and  urban  water  use. 

Precision  agriculture  in  general,  and  variable  rate  application  technology  in 
particular,  has  shown  promise  for  addressing  both  economic  and  environmental  concerns 
of  agricultural  production  (National  Research  Council,  1997).  From  an  economic 
perspective,  farmers  could  conceivably  maximize  net  returns  by  boosting  yields  in  areas 
where  crop  growth  can  respond  to  additional  inputs.  Additionally,  the  use  of  fertilizers, 
pesticides,  lime,  etc.  could  be  minimized  in  low-yielding  areas  where  crop  growth  is 
limited  by  factors  beyond  the  farmer's  control.  From  an  environmental  viewpoint,  it 
would  be  possible  to  approach  the  ideal  situation  in  which  all  the  inputs  applied  to  the 
crop  would  actually  be  consumed  by  it,  leaving  none  free  to  contaminate  the  environment 
(Pierce  and  Nowak,  1999). 
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Variable  rate  application  technology  is  controlled  by  spatially  variable 
prescriptions,  i.e.  site-specific  dosages  of  crop  inputs  (Morgan  and  Ess,  1997).  Making 
these  prescriptions  requires  understanding  the  causes  of  spatial  yield  variability,  as  well 
as  the  sensitivity  of  crop  yield  to  the  application  of  specific  inputs. 

Crop  models  have  been  used  as  analytical  tools  to  understand  environmental 
influence  on  crop  yield.  Crop  models  provide  a  unique  opportunity  to  account  for 
numerous  factors  influencing  yield  in  ways  that  are  impossible  with  traditional 
agronomic  experimentation.  Models  have  often  been  used  to  analyze  causes  of  temporal 
yield  variability  related  to  weather  and  climate  (Boote  et  at,  1996;  Messina  et  at,  1999; 
Rosenzweig  and  Iglesias,  1998),  and  have  recently  also  been  used  for  understanding 
spatial  yield  variability  (Irmak  et  at,  2001;  Paz  and  Batchelor,  2000;  Paz  et  at,  1998). 
Crop  simulation  models'  ability  to  reproduce  both  temporal  and  spatial  crop  yield 
variability  suggests  that  they  may  be  ideal  tools  for  diagnostic  and  prescriptive  use  in 
precision  agriculture. 

However,  to  date  there  have  been  few  efforts  at  using  crop  simulation  models  for 
describing  how  the  spatial  variability  of  soil  water  content  influences  crop  yield. 
Considering  that  drought-related  stresses  typically  limit  crop  growth  in  the  world's  major 
crop-producing  regions,  spatial  water  distribution  is  an  important  consideration  when 
applying  precision  agriculture  in  those  regions. 

Paz  and  Batchelor  (2000)  used  the  CROPGRO  model  (Boote  et  at,  1998)  to 
analyze  spatial  soybean  yield  variability  in  a  field,  attributing  it  to  the  spatially  variable 
effects  of  plant  density,  weeds,  soybean  cyst  nematodes,  and  water  stress.  Previously,  Paz 
et  at  (1998)  explained  the  influence  of  water  stress  in  terms  of  rooting  depth  (loosely 
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equivalent  to  a  soil  water  holding  capacity  parameter),  and  a  drainage  parameter,  either 
the  saturated  hydraulic  conductivity  (KSAT)  or  a  soil  drainage  rate  coefficient  (SLDR), 
both  of  which  control  the  rate  at  which  saturated  soil  drains  fully  down  to  its  drained 
upper  limit.  These  authors  noted  that  their  modeling  scheme's  ability  to  reproduce 
observed  data  degraded  in  low-lying  areas,  possibly  due  to  the  model's  inability  to 
account  for  run-on  or  sub-surface  flow  from  neighboring  areas.  Indeed,  given  that  these 
processes  were  not  considered,  the  parameterization  procedure  tried  to  explain  excess 
water  across  the  field  in  terms  of  slow  drainage  or  increased  rooting  depth.  This  might 
affect  the  predictive  capability  of  this  procedure  when  the  model  is  used  in  years  not  used 
for  parameter  calibration. 

Perhaps  crop-modeling  efforts  in  precision  agriculture  could  explain  more  reliably 
the  causes  of  water  stress  induced  yield  variability  if  spatial  water  movement  were 
explicitly  considered,  i.e.  if  the  model  simulated  the  coupling,  or  interchange  of  water 
and  possibly  nutrients,  between  different  landscape  locations.  This  should  include  a  three 
dimensional  water  balance  simulation  across  the  landscape.  Such  a  complete 
spatially-coupled  crop  model  does  not  currently  exist.  However,  a  simple  approximation 
can  be  used  to  test  our  working  hypothesis,  implicit  in  the  literature  to  date,  that  an 
appropriate  selection  of  parameters  can  allow  an  uncoupled  model,  i.e.  one  in  which  the 
different  landscape  units  do  not  interchange  water,  to  reproduce  the  spatiotemporal 
variability  of  simulated  yield  produced  by  a  spatially-coupled  model.  The  specific 
objectives  of  our  study  were: 

1 .  To  develop  a  simple  spatially-coupled  water  balance  model,  and  use  it  to  generate 
synthetic  crop  yield  maps. 

2.  To  determine  under  which  conditions,  if  any,  spatially-coupled  and  uncoupled 
models  might  produce  similar  results. 
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3.  To  estimate,  using  spatially-coupled  and  uncoupled  models  with  inverse  modeling 
(IM)  techniques,  the  soil  parameters  of  the  spatially-coupled  model,  and  quantify 
the  errors  incurred. 

4.  To  quantify  the  error  of  prediction  incurred  when  using  the  different  sets  of  soil 
parameters  resulting  from  objective  3  to  predict  yields  in  years  not  used  for 
parameterization. 

Materials  and  Methods 
Generating  Synthetic  Yield  Maps:  the  Spatially-Coupled  Model 

We  simulated  a  soybean  crop  across  a  spatially  variable  field  using  the 
CROPGRO-Soybean  model  (Boote  et  al.,  1998).  We  chose  soybeans  because  the 
CROPGRO  model  reproduces  crop  responses  to  multiple  environmental  factors  such  as 
temperature  and  day  length,  as  well  as  the  effects  on  plant  growth  of  both  insufficient  and 
excessive  soil  water  content.  The  latter  two  are  of  special  interest  in  this  study,  since  they 
can  be  expected  to  show  spatial  variability  across  agricultural  fields. 


Figure  2-1.  Landscape  model  used  for  the  spatially-coupled  model  simulations.  Note  that 
the  cells  only  communicate  via  surface  flow;  subsurface  lateral  flow  was 
assumed  to  be  insignificant. 

We  made  a  simple  spatial  extension  to  CROPGRO  as  shown  in  Figure  2-1 .  We 
assumed  an  agricultural  field  with  significant  topographical  variation  along  one 
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dimension  and  little  or  no  variation  along  the  other.  We  approximated  the  field  with  a 
toposequence  (Figure  2-1)  composed  of  several  (10)  cells  that  represent  parallel  whole- 
field-long  swaths  of  some  arbitrary  width.  This  is  equivalent  to  a  sloping  field  having 
straight,  parallel  contour  lines. 

The  model  can  be  configured  so  the  cells  behave  in  one  of  two  ways  during  rainfall 
events: 

•  Spatially-coupled:  the  surface  of  each  cell  receives  input  of  water  from  rainfall 

and  from  cells  uphill  of  it.  Water  outputs  from  the  cell  surface 
are  infiltration  and  runoff  to  the  cells  downhill  of  it. 

•  Uncoupled:  the  surface  of  each  cell  only  receives  inputs  form  rainfall,  with 

no  contribution  from  neighboring  cells.  Runoff  is  assumed  to 
be  lost  and  does  not  contribute  water  to  neighboring  cells. 

In  both  cases,  we  partitioned  water  inputs  into  infiltration  and  runoff  using  the  SCS 
Curve  Number  Method  (USDA  SCS,  1972),  which  we  chose  due  to  its  simplicity  and 
popularity,  and  because  it  is  already  built  into  CROPGRO.  A  detailed  explanation  of  the 
method  is  shown  below  to  make  subsequent  work  clearer. 

The  curve  number  method  can  be  derived  starting  from  a  mass  balance  equation 
applied  to  a  storm  event 

R  =  Pe-I  (2-1) 

where  R  (mm)  is  runoff,  /  (mm)  is  the  amount  of  infiltration  and  surface  retention,  and  Pe 
(mm)  is  a  term  called  effective  precipitation.  In  the  SCS  method  this  is  defined  as  the 
amount  of  rainfall  that  can  contribute  to  runoff,  equal  to  the  rainfall  amount  exceeding  an 
initial  abstraction  Ia  (mm),  which  is  the  amount  of  precipitation  necessary  before  runoff 
can  begin.  Thus, 

Pe=P-Ia  (2-2) 
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where  P  (mm)  is  precipitation.  The  critical  assumption  in  the  curve  number  method  is 

(2-3) 


A-L 
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where  S  is  the  potential  maximum  retention  (in  mm).  Equation  2-3  stipulates  that  the 
ratio  of  runoff  to  the  effective  rainfall  is  the  same  as  the  ratio  of  actual  retention  to  S 
(Boughton,  1989).  Operating  with  Equation  2-1  yields 
I  =  Pe-R,  (2-4) 
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A  second  assumption  of  the  method  is  that  Ia  =  0.2  •  S  (2-6) 

Replacing  Equation  2-6  into  Equation  2-5,  and  considering  the  definition  of  Ia 
yields  the  SCS  runoff  equation: 
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The  potential  retention  parameter  S  is  usually  expressed  in  terms  of  a  dimensionless 
runoff  curve  number  CN  through  the  expression 
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We  ran  the  modified  version  of  CROPGRO  successively  for  the  10  cells,  starting 
downward  from  the  highest  cell  in  the  toposequence.  The  first  (highest)  cell  did  not  ever 
receive  runon,  since  it  is  assumed  that  there  are  no  positions  higher  in  the  landscape  that 
contribute  water  to  it.  In  the  spatially-coupled  model,  each  one  of  the  successively  lower 
cells  received  daily  runon  equal  to  the  total  daily  runoff  in  the  cell  immediately  above  it. 
The  main  assumptions  are: 

1.  The  only  relevant  processes  in  the  landscape  are  rainfall  P,  infiltration  /,  and  runoff 
(or  runon)  R. 

2.  The  total  runoff  volume  from  cell  i-1  equals  the  total  volume  of  runon  to  cell  i. 

3.  The  runon  to  a  cell  can  be  considered  to  generate  runoff  as  if  it  were  additional 
rainfall  (i.e.  that  we  can  use  the  SCS  runoff  equation  to  estimate  the  runoff  from  a 
cell,  taking  as  rainfall  input  the  sum  of  rainfall  and  runon  in  the  cell). 

4.  The  total  runoff  volume  from  the  field  is  equivalent  to  the  total  runoff  volume  from 
the  last  cell. 

5.  The  field  is  assumed  to  be  partitioned  into  N  cells  of  equal  area,  with  each  cell 
having  an  area  Aj  =  A  I N,  where  A  is  the  total  area  of  the  field  and  N  is  the  total 
number  of  cells  into  which  it  is  partitioned. 

Assuming  P  >  Ia,  mass  balance  for  an  individual  cell  i,  can  be  expressed  as: 


Where  P,  is  the  rainfall  on  the  cell,      is  the  runon  to  the  cell,  equal  to  the  runoff 
from  the  previous  cell  i-1,  //  is  the  infiltration,  lai  is  the  initial  abstraction  and  /?,  is  the 
runoff.  This  equation  is  expressed  in  terms  of  total  volumes  of  water.  However,  if  all  the 
cells  have  an  equal  areaAp,  Equation  2-10  can  be  divided  by  Ap  and  the  equation  will 


=  AlIl+AlRl+A,Ia,  (2-10) 
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then  be  expressed  in  volume  of  water  per  unit  area.  Infiltration  can  thus  be  expressed  as 
follows: 


I,=P,+R^-R,-Iai 


(2-11) 


Runoff  can  be  calculated  directly  using  the  SCS  runoff  Equation  2-9,  modified  to 
add  runon  to  the  precipitation  on  a  cell  (and  assuming  that  the  areas  of  all  cells  are  equal): 

+*,_,- o.2.  s,y 

where  Sj  is  the  maximum  retention  term.  Finally,  replacing  Equation  2-8  into  2-12: 
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We  used  Equations  2-1 1  and  2-12  to  calculate  the  runoff  and  infiltration  for  each 
cell.  We  began  from  the  top  of  the  slope  because  cell  1  receives  no  runon;  consequently, 
the  value  of  Ri  could  be  calculated  directly  using  Equation  2-9.  We  moved  downward 
from  cell  1  to  the  last  cell  N,  calculating,  on  a  daily  basis,  R>  and  then  /,  for  each  cell.  This 
assumes  that  the  field  is  steep  enough  for  all  the  runoff  to  leave  the  field  in  one  day. 

We  ran  the  model  for  30  years  of  historic  weather  data.  Assuming  that  the  spatially- 
coupled  crop  model  is  a  perfect  predictor  of  crop  yield,  this  procedure  created  a  synthetic 
yield  map  for  each  weather  year  and  parameter  pattern.  Yield  varied  spatially,  i.e.  across 
the  cells,  due  to  differences  in  soil  properties  from  cell  to  cell  and  differences  in  spatial 
soil  water  distribution. 

It  was  not  necessary  to  modify  the  crop  model  itself  to  obtain  the  additional  spatial 
functionality  (daily  runon  is  added  to  each  cell's  precipitation  input),  but  it  was 
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convenient  to  write  a  simple  shell  program  to  invoke  the  successive  model  runs  and 
manage  the  input  and  output  data  of  the  cells  of  the  toposequence. 

Generating  Synthetic  Yield  Maps:  Input  Data 

Our  study  site  represents  conditions  near  Cordoba  (Argentina).  Cordoba  (31°  29'  S, 
64°  13'  W)  lies  on  the  northwestern  edge  of  the  Pampas  region  of  Argentina.  Mean 
annual  rainfall  between  1 966  and  1 995  was  844  mm,  mostly  concentrated  in  the  spring 
and  summer.  The  soil  is  a  Typic  haplustoll,  a  deep  silty  loam  with  high  water  holding 
capacity  and  no  limitations  to  drainage  (Dardanelli  et  al.,  1997). 

However,  soils  in  the  region  have  poor  structural  stability  (Chagas  et  al.,  1995)  and 
are  thus  subject  to  crusting  and  high  runoff  during  the  high-intensity  thunderstorms 
characteristic  of  the  summer.  The  soil  properties  adopted  for  the  simulations  are  shown  in 
Table  2-1 .  These  parameters  were  derived  from  expert  opinion  and  field  measurements  in 
a  microwatershed  in  the  location  of  interest  (H.  Apezteguia,  Pers.  Comm.).  Note  how 
there  is  a  small  variation  of  the  parameters  along  the  toposequence,  responding  to  a 
somewhat  greater  presence  of  clay  particles  downslope.  We  used  CROPGRO  genetic 
coefficients  corresponding  to  varieties  that  have  been  used  widely  in  the  region:  Asgrow 
5406  (S.  Meira  and  E.  Guevara,  Pers.  Comm.).  We  used  a  planting  date  of  November  10, 
the  modal  date  for  the  region  (J.  Dardanelli,  Pers.  Comm.). 

Soil  water  content  at  planting  depends  on  factors  such  as  tillage,  weed 
management,  the  weather  in  the  months  leading  up  to  planting,  and  the  water  extraction 
pattern  of  the  previous  crop.  Soil  water  can  strongly  influence  final  crop  yield,  especially 
in  areas  like  Cordoba,  where  drought  periods  are  frequent  during  the  growing  season.  We 
simulated  the  interannual  variability  of  initial  soil  water  content  by  sampling  from  a 
distribution  of  2970  synthetic  water  content  values  at  planting  simulated  for  the  region  by 
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Ferreyra  et  al.  (2001).  The  mean  value  was  approximately  100  mm  of  available  water, 
representative  of  field  measurements  in  the  region  (Ferreyra,  1998). 
Table  2-1.  Soil  parameters  used  for  the  spatially-coupled  crop  model. 


Cell 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Depth3  (cm) 

210 

210 

210 

210 

210 

210 

210 

210 

210 

210 

SLDJ? 

0.60 

0.60 

0.59 

0.58 

0.57 

0.56 

0.55 

0.54 

0.53 

0.52 

CN2e 

93.00 

93.30 

94.00 

94.70 

95.00 

95.00 

95.00 

94.88 

94.76 

94.65 

KSAT*  (cm/day) 

4 

4 

4.22 

4.22 

4.22 

4.22 

4.22 

4.21 

4 

3.72 

ZZ02e(cm3/cm3)     0.113   0.113   0.113   0.113  0.113  0.113  0.113  0.117  0.121  0.125 

ZZ03e(cm3/cm3)      0.103   0.103   0.103   0.103  0.103  0.103  0.103  0.107  0.111  0.115 

ZZ04e(cm3/cm3)     0.097   0.097   0.097   0.097  0.097  0.097  0.097  0.101  0.105  0.109 

ZI05e(cm3/cm3)      0.097   0.097   0.097   0.097  0.097  0.097  0.097  0.101  0.105  0.109 

ZZ0<5e(cm3/cm3)      0.099   0.099   0.099   0.099  0.099  0.099  0.099  0.103  0.107  0.111 

ZZ07e(cm3/cm3)      0.103   0.103    0.103   0.103  0.103  0.103  0.103  0.107  0.111  0.115 

LL08e  (cm3/cm3)     0.101    0.101    0.101    0.101  0.101  0.101  0.101  0.105  0.109  0.113 

LL09e  (cmW)      0.099   0.099   0.099  0.099  0.099  0.099  0.099  0.103  0.107  0.111 

ZZ;0e(cm3/cm3)     0.099   0.099   0.099   0.099  0.099  0.099  0.099  0.103  0.107  0.111 

DULOl'  (cm3/cm3)    0.321    0.321    0.321    0.321  0.321  0.321  0.321  0.325  0.329  0.333 

Z)[/Z02  f(cm3/cm3)  0.294   0.294   0.294   0.294  0.294  0.294  0.294  0.298  0.302  0.306 

Z)t/Z03f(cm3/cm3)  0.267   0.267   0.267   0.267  0.267  0.267  0.267  0.271  0.275  0.279 

Z)M0^f(cmW)  0.250   0.250   0.250   0.250  0.250  0.250  0.250  0.254  0.258  0.262 

Z)£/Z05  f(cm3/cm3)  0.247   0.247   0.247   0.247  0.247  0.247  0.247  0.251  0.255  0.259 

DUL06  f(cm3/cm3)  0.245   0.245   0.245   0.245  0.245  0.245  0.245  0.249  0.253  0.257 

Z)f/I07f(cm3/cm3)   0.245   0.245   0.245   0.245  0.245  0.245  0.245  0.249  0.253  0.257 

DUL08{ (cm3 7cm3)   0.245   0.245   0.245   0.245  0.245  0.245  0.245  0.249  0.253  0.257 

Z)t/Z09  f(cm3/cm3)  0.245   0.245   0.245   0.245  0.245  0.245  0.245  0.249  0.253  0.257 

Dl£70f(cm3/cm3)  0.245   0.245   0.245   0.245  0.245  0.245  0.245  0.249  0.253  0.257 

SAT01*  (cnrVcm3)    0.488   0.488   0.488   0.488  0.488  0.488  0.488  0.492  0.496  0.500 

SAT02S (cm3 lea?)    0.487  0.487   0.477   0.477  0.477  0.477  0.477  0.477  0.487  0.497 

W03g(cm3/cm3)    0.476   0.476   0.466   0.466  0.466  0.466  0.466  0.466  0.476  0.486 

W0¥8(cm3/cm3)    0.431    0.431    0.421    0.421  0.421  0.421  0.421  0.421  0.431  0.441 

W05g(cm3/cm3)    0.403   0.403   0.393   0.393  0.393  0.393  0.393  0.393  0.403  0.413 

W06g(cm3/cm3)    0.386   0.386   0.376   0.376  0.376  0.376  0.376  0.376  0.386  0.396 

W07g(cm3/cm3)    0.385   0.385   0.375   0.375  0.375  0.375  0.375  0.375  0.385  0.395 

W0Sg(cm3/cm3)    0.385   0.385   0.375   0.375  0.375  0.375  0.375  0.375  0.385  0.395 

&47»0g(cm3/cm3)    0.385   0.385   0.375   0.375  0.375  0.375  0.375  0.375  0.385  0.395 

£4770g(cm3/crn3)  0.385  0.385  0.375  0.375  0.375  0.375  0.375  0.375  0.385  0.395 
a:  Maximum  rooting  depth  of  the  soil  profile. 

b:  Soil  drainage  rate.  Controls  the  rate  at  which  a  saturated  layer  drains  to  its  DUL. 
c:  nominal  season-long  CN  value  used  in  the  SCS  runoff  curve  number  method, 
d:  Saturated  hydraulic  conductivity. 

e:  Lower  limit  of  soil  water  holding  capacity  for  the  ten  soil  layers. 

f:   Drained  upper  limit  of  soil  water  holding  capacity  for  the  ten  soil  layers. 

g:  Saturation  soil  water  content  for  the  ten  soil  layers. 
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Estimating  Soil  Parameters  Using  Inverse  Modeling 

We  estimated  three  soil  parameters  (detailed  below)  using  IM,  and  compared  the 
estimates  with  the  known  parameter  values  (Table  2-1).  We  traversed  the  toposequence 
downhill,  estimating  soil  parameters  for  each  cell.  We  searched  exhaustively  over  the  3- 
dimensional  parameter  space  of  each  cell,  as  per  Irmak  et  al.  (2001).  As  in  that  study, 
each  cell's  optimal  parameter  combination  was  the  one  that  minimized  an  objective 
function  defined  as  the  root  mean  squared  error  of  that  cell's  yield  prediction  over  several 
years.  We  detail  the  number  of  years  and  their  selection  below. 

We  varied  three  parameters:  the  nominal  season-long  CN  value  used  in  the  SCS 
runoff  curve  number  method  (USDA  SCS,  1972),  CN2;  the  saturated  hydraulic 
conductivity  of  the  bottom  soil  layer,  KSAT;  and  the  fraction  of  nominal  maximum 
available  water,  FA  W.  The  latter  was  used  to  modify  the  soil  water  holding  characteristics 
of  the  whole  profile  using  only  one  parameter:  we  defined  FAW  as,  the  ratio  between  each 
soil  layer's  estimated  maximum  available  water  and  the  true  maximum  available  water 
for  that  layer.  The  maximum  available  water  is  defined  as  (DUL  -  LL),  where  DUL  and 
LL  are  the  drained  upper  limit,  and  lower  limit  of  soil  water  holding  capacity, 
respectively  (Ritchie,  1981).  We  kept  the  LL  of  each  soil  layer  at  its  true  value,  and 
modified  the  DUL  according  to  the  FA  lvalue  (FAW=  1  makes  the  DUL  take  its  real 
value,  FAW=  0.5  takes  DUL  halfway  between  the  real  value  of  DUL  and  the  LL,  etc.) 

We  classified  weather  years  according  to  water  availability  during  the  season, 
expressed  as  the  sum  of  initial  soil  water  content  and  rainfall  during  the  season.  We 
called  this  variable  TSW  (total  seasonal  water),  and  used  it  to  rank  the  30  available 
weather  years.  We  sampled  four  years  from  each  tercile  of  the  TSW  distribution:  four 
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"dry"  years,  four  "normal"  years,  and  four  "wet"  years,  and  used  them  to  define 
parameterization  cases. 

We  defined  three  different  cases  based  on  the  TSW  of  the  crop  years  used  for 
parameterization.  The  first  case  was  an  unbiased  benchmark,  consisting  of  two  years 
from  each  of  the  three  TSW  textiles,  for  a  total  of  six  years.  The  second  case  was  biased 
toward  "dry"  years,  and  consisted  of  four  "dry"  years  and  two  "normal"  years.  The  third 
case  was  biased  toward  "wet"  years,  and  consisted  of  four  "wet"  and  two  "normal"  years. 
We  chose  to  use  six  years  of  weather  for  each  case,  2-3  times  the  number  used  for  other 
recent  studies  (Batchelor  et  al.,  2002;  Irmak  et  al.,  2001;  Paz  et  al.,  1998),  to  minimize 
the  possibility  of  overfitting. 

Figure  2-2  shows  the  weather  years  chosen  for  the  three  weather  cases,  and  the 
TSW  of  each.  Note  how  the  total  TSW  range  exceeds  400  mm,  and  how  the  unbiased  case 
shares  three  years  with  each  of  the  other  cases. 
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Figure  2-2.  Parameterization  weather  cases.  Each  row  of  points  describes  a  weather  case. 
The  filled  circles  are  years  from  the  lower  TSW  textile,  the  open  squares  are 
from  the  middle  TSWtercile,  and  the  filled  triangles  are  from  the  upper  tercile. 
The  label  on  each  point  shows  the  year,  e.g.  85  corresponds  to  1985. 
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An  important  source  of  crop  modeling  error  may  arise  from  lack  of  knowledge  of 
initial  soil  water  conditions.  To  explore  this  aspect  we  defined  two  initial  condition  cases: 
having  perfect  knowledge  about  the  initial  soil  water  conditions,  and  having  no 
knowledge.  In  the  latter  we  used  the  median  (over  the  30  available  water  years)  soil  water 
content  value,  approximately  100  mm. 

In  summary,  there  were  three  sources  of  uncertainty  in  parameter  estimation: 

•  The  imperfect  crop  model  when  using  the  uncoupled  model. 

•  Biased  weather  cases  in  the  IM  process. 

•  The  lack  of  knowledge  about  initial  soil  water  conditions. 

Combining  the  possible  states  of  these  sources  of  uncertainty  led  to  12  distinct  IM 
parameter  estimation  scenarios,  defined  by  3  weather  cases  (benchmark,  dry-biased,  wet- 
biased)  x  2  models  (spatially-coupled,  uncoupled)  x  2  initial  condition  cases  (initial 
conditions  known  or  unknown).  One  of  these  scenarios,  the  one  using  the  benchmark 
weather  case,  the  spatially-coupled  model,  and  known  initial  conditions,  was  expected  to 
reproduce  the  real  soil  parameters  most  closely. 
Evaluating  With  Independent  Data 

After  estimating  soil  parameters  for  the  12  IM  scenarios,  we  tested  how  well  the 
corresponding  parameter  sets  estimated  yields  for  the  1 8  weather  years  not  involved  in 
each  scenario's  parameter  estimation  process.  We  tried  this  in  two  ways:  having  perfect 
knowledge  about  the  initial  conditions  and  with  no  knowledge  thereof,  similarly  to  the 
cases  described  above.  We  stratified  the  results  by  each  year's  TWtercile.  We  did  this  to 
show  whether  the  spatially-coupled  and  uncoupled  models'  predictive  performance 
varied  according  to  the  relationship  between  the  weather  case  used  for  parameter 
estimation  and  the  weather  (as  described  by  the  TSW  tercile)  used  for  evaluation. 
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Results  and  Discussion 
Exploring  Similar  Behavior  Between  Spatially-Coupled  and  Uncoupled  Models 

The  conditions  under  which  the  spatially-coupled  and  uncoupled  models  can 
behave  similarly  can  be  analyzed  using  the  top  two  cells  of  the  toposequence.  Equation  2- 
1 3  describes  runoff  for  an  arbitrary  cell  in  the  toposequence  of  the  spatially-coupled 
model,  and  the  particular  case  of  the  topmost  cell,  which  receives  no  runon,  can  be 
described  using  Equation  2-9.  Replacing  the  expression  of  the  runon  entering  cell  2  with 
Equation  2-9,  i.e.  the  runoff  from  cell  1 ,  and  assuming  that  rainfall  is  constant  throughout 
the  toposequence  and  greater  than  the  greater  Ia  of  the  two  cells,  runoff  from  the  second 
cell  can  be  expressed  as  follows: 


\2 

) 


R2  =  7  ^  V  (2-14) 

{      (P  +  0.8-S.)  2J 

where  Si  and  S2  are  the  retention  parameters  for  the  first  (topmost)  and  second  cells, 
respectively. 

Equation  2-9  describes  runoff  in  any  cell  of  the  uncoupled  model.  In  order  for  the 
uncoupled  model  to  substitute  for  the  spatially-coupled  model,  then  for  any  realistic 
values  of  Si  and  S2  in  the  spatially-coupled  model  there  should  exist  a  value  SEQ2  and  its 
corresponding  curve  number  CNEQ2  that  predict,  using  the  uncoupled  model,  the  same 
input  of  water  into  the  soil  of  cell  2,1  +  Ia,  as  that  in  cell  2  of  the  spatially-coupled 
model.  This  should  be  valid  for  any  realistic  environmental  conditions,  i.e.  rainfall.  Based 
on  Equation  2-11,  and  assuming  that  P  +  R,  >  0.2S2  and  P  >  0.2SEq2,  the  following 
should  be  true: 
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(P-02^    {       (P  +  OS-St)      •      2)  (P-0.2.SEQ2f  (2.15) 

(P  +  M-Si)  "  f      (P-Q.a-J,)'  >  -°-25-?-(P  +  0.8.5,;2)-5- 

I       (P  +  0.8-5,)      '  2J 

We  solved  Equation  2-15  for  the  SEQ2  of  the  simpler  problem  in  which  S/  =  S2  =  S, 
i.e.  CNi  =  CA^  =  CN,  and  obtained  the  following  solutions: 


'  EQ2A 


SS'  +290PS2  -300P2S  +  1500F3  +  2.&2&4tJ&S}  -545PS2  -\50P2S -2250P}(S -5P)l 

2{50P2  +30PS  +  MS2) 


(2- 16a) 


_  -8S3  +290P52  -SOOP^  +  ISOOP3  -2.8284V853  -545P52 -\50P2S-2250P3{S-5P)2 

2(50/' 2  +30P5  +  1752) 

(2- 16b) 

Of  the  two  solutions,  only  Equation  2- 16b  is  valid.  Given  that  both  solutions 
produce  the  same  water  input  results,  the  right-hand  side  of  Equation  2-15  evaluates  to 
the  same  numerical  value  for  both  solutions;  however,  the  P  -  0.2SEQ2  term  is  negative 
for  Equation  2- 16a.  Although  it  produces  the  same  value  as  Equation  2- 16b  when 
squared,  it  does  not  have  a  physical  meaning.  As  shown  in  Equation  2-9,  runoff  should  be 
0  whenP  <0.2SEQ2. 

Figure  2-3  shows  the  values  of  CNEQ2  corresponding  to  the  SEQ2  of  Equation  2- 16b, 
for  different  combinations  of  P  and  CN.  Note  how  CNEq2  initially  decreases  with 
increasing  rainfall,  reflecting  the  additional  contribution  of  runon  to  infiltration  through  a 
smaller  CN.  For  greater  values  of  P,  CNEQ2  increases  asymptotically  toward  CN,  and  this 
trend  begins  for  lower  values  of  P  as  CN  increases.  This  behavior  can  be  understood  by 
differentiating  Equation  2-14  with  respect  to  P.  Defining  the  auxiliary  terms  shown  by 
Equation  2-17,  the  derivative  results  in  Equation  2-18  as  follows: 
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B  = 


P-0.2S, 
P  +  0.8S,  ' 


C  = 


(P-0.2S,)2 
P  +  0.8S, 


P  +  C  +  0.8S, 


P  +  C-0.2S, 


(2-17) 


dR2 


[\  +  2B-B2][2D-D2]  (2-18) 


Equation  2-18  tends  to  1  as  P  tends  to  infinity,  and  as  5"  tends  to  0,  i.e.  as  CN  tends 
to  100.  If  the  derivative  tends  to  1,  then  CNEQ2  will  tend  to  CN  because  the  effect  of  any 
runon  from  upslope  (and  thus,  the  effect  of  a  spatially-coupled  model)  become  irrelevant 
as  the  additional  water  is  fully  lost  to  runoff. 

Equation  2-15  and  Figure  2-3  relate  to  our  objective  of  determining  under  which 
conditions,  if  any,  the  spatially-coupled  and  uncoupled  model  might  produce  similar 
results.  The  only  case  for  which  CNEQ2  remains  constant  for  different  rainfall  amounts  is 
CN]  =  CN2  =  100  (not  shown  in  Figure  2-3),  which  is  useless  in  an  agricultural 
environment  because  it  is  associated  with  zero  infiltration,  as  shown  by  Equation  2-7  for 
S  =  0.  In  more  practical  scenarios,  it  would  be  impossible  to  exactly  reproduce  the  water 
infiltration  regime  of  a  spatially-coupled  model  using  an  uncoupled  model;  a  crop  season 
includes  many  rainfall  events,  each  with  its  own  rainfall  amount,  and  the  number  of 
storms  and  rainfall  amounts  varies  from  year  to  year. 

The  relevance  in  terms  of  crop  yield  of  this  water-specific  conclusion  will  vary 
from  year  to  year  and  will  depend  on  the  CN  in  question.  For  very  wet  years,  as  well  as 
for  soils  associated  with  high  curve  numbers  or  intense,  convective  storms  during  the 
cropping  season,  it  may  be  irrelevant.  Contrarily,  it  may  be  very  important  in  water- 
limited  environments  with  intermediate  infiltration  and  storms  with  moderate  rainfall 
amounts.  Furthermore,  it  is  possible  that  other  errors  such  as  initial  conditions  or  weather 
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biases  might  be  more  relevant  to  yield  than  model  errors  due  to  uncoupling.  These 
aspects  are  further  explored  below. 

Cty  =  CN2  =  90 
CNij-CN,  =  85 
CNt  =  CN2  =  80 
CNi  =  CN2  =  75 

CNi  =  CN2  =  70 

i  1  1  1 

0    10   20    30   40    50   60    70    80    90  100 
Rainfall  (mm) 

Figure  2-3.  Values  of  CNeq2  (curve  number  in  toposequence  cell  2  of  the  uncoupled 

model  that  produces  equivalent  infiltration  to  cell  2  of  the  spatially-coupled 
model)  for  different  combinations  of  rainfall  and  curve  number  values  in  cells 
1  and  2  (CNi,  CN2)  of  the  spatially-coupled  model. 


Simulated  Yield  Profiles 

Figure  2-4  shows  a  histogram  of  the  initial  soil  water  content  used  in  this  study. 
Plant  extractable  soil  water  to  a  depth  of  210  cm  ranged  from  slightly  over  30  mm  to 
slightly  under  190  mm,  reflecting  the  effects  of  interannual  weather  variability  during 
antecedent  crop  cycles.  The  former  case  could  be  expected  following  a  crop  that  extracts 
water  from  very  deep  in  the  profile,  such  as  sunflower,  and  a  subsequent  dry  winter 
typical  for  the  region;  the  latter  can  reflect  conditions  after  a  short-season  maize  (which 
can  leave  water  deep  in  the  profile),  followed  by  good  spring  rains  prior  to  planting. 
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Figure  2-4.  Histogram  of  PESW'm  available  initial  condition  set. 
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Figure  2-5.  Simulated  yield  profiles  made  with  the  spatially-coupled  (left)  and  uncoupled 
(right)  models.  The  results  of  the  uncoupled  model  show  the  effect  of  spatial 
variability  of  soil  properties;  the  spatially-coupled  model  results  show  the 
additional  effect  of  spatial  water  movement. 
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Figure  2-5  shows  simulated  yield  for  the  spatially-coupled  (left  column)  and 
uncoupled  (right  column)  models  using  the  soil  properties  shown  in  Table  2-1.  Each  row 
corresponds  to  a  tercile  of  the  TSW  distribution;  the  dry  tercile  is  on  the  top,  the  wet 
tercile  on  the  bottom.  Each  boxplot  shows  the  results  of  10  years  of  simulations.  The  10 
cells  shown  per  boxplot  are  arranged  in  progressively  lower  landscape  positions  from  left 
to  right.  The  results  of  the  uncoupled  model  show  the  influence  of  the  spatial  variability 
of  soil  properties  shown  in  Table  2-1.  The  spatially-coupled  model  results  additionally 
include  the  effects  of  spatial  water  movement.  Note  how  the  uncoupled  model  results 
change  from  cell  to  cell,  especially  in  the  middle  and  wet  weather  terciles.  This  is 
primarily  a  result  of  spatial  CN2  variability;  despite  its  apparently  small  variation 
throughout  the  toposequence,  from  93  to  95,  infiltration  is  greatly  affected  by  these  small 
variations  in  the  upper  CN2  range  i.e.  lower  S  range.  This  becomes  clear  by 
differentiating  Equation  2-14  with  respect  to  CNi  or  CN2  (not  shown). 

Central  Argentina  is  a  water-limited  environment  for  soybean  growth;  note  how  the 
yield  at  the  top  of  the  slope  has  a  median  value  of  slightly  over  2000  kg/ha,  and  increases 
significantly  downslope  given  that  the  lower  cells  have  more  water  available  from  runon. 
The  effect  is  less  noticeable  in  dry  years  because  there  is  less  rain  and  consequently,  less 
runoff  /  runon. 

Inverse  Modeling:  Parameter  Estimation  Error 

Figures  2-6  and  2-7  show  the  results  of  using  IM  and  a  search  algorithm  to  find  the 
parameter  combination  that  best  fits  the  "observed"  yield  patterns  of  Figure  2-5.  Only 
CN2  and  FAW results  are  shown  because  large  changes  in  KSAT did  not  produce  changes 
in  yield.  This  happens  because  precipitation  rarely  exceeds  evapotranspiration  during  the 
growing  season  in  Cordoba.  When  it  does,  the  high  water  holding  capacity  of  the  Entic 
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Haplustoll  makes  it  highly  improbable  that  the  drainage  implementation  of  the 
CROPGRO  water  balance  module,  which  only  begins  moving  water  out  of  a  layer  when 
the  layer's  water  content  exceeds  its  DUL,  could  drain  beyond  a  depth  of  210  cm.  This  is 
consistent  with  the  lack  of  a  limiting  horizon  in  Entic  Haplustolls. 
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Figure  2-6.  CN2  values  obtained  by  IM  for  the  spatially-coupled  (filled  circles)  and 

uncoupled  (open  circles)  models.  The  left  column  shows  the  case  of  perfect 
knowledge  of  initial  conditions,  and  the  right  shows  the  case  in  which  only  the 
average  initial  soil  water  content  is  known.  The  three  rows  correspond  to  the 
three  calibration  weather  cases.  The  filled  circles  in  the  left  column  coincide 
with  the  actual  parameter  values. 


There  are  twelve  scenarios  defined  by  the  combinations  of  model,  knowledge  of 
initial  conditions,  and  IM  weather  case.  In  all  the  scenarios  the  results  provided  by  the 
spatially-coupled  and  uncoupled  models  coincide  in  the  first,  uppermost  cell  of  the  slope. 
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This  occurs  because  the  uppermost  cell  in  the  spatially-coupled  model  receives  no  runon 
and  thus  behaves  identically  to  the  corresponding  cell  of  the  uncoupled  model. 


Perfect  IC  knowledge  Only  know  C  average 


CELL  CELL 

INITIAL  CONDITION  KNOWLEDGE  CASE 

Figure  2-7.  FAW  values  obtained  by  IM  for  the  spatially-coupled  (filled  circles)  and 
uncoupled  (open  circles)  models.  The  left  column  shows  the  case  of  perfect 
knowledge  of  initial  conditions,  and  the  right  shows  the  case  in  which  only  the 
average  initial  soil  water  content  is  known.  The  three  rows  correspond  to  the 
three  IM  weather  cases. 


The  spatially-coupled  model  faithfully  reproduced  its  own  parameters  at  all 
landscape  positions  when  the  initial  conditions  were  known,  as  shown  by  the  CN2  values 
(filled  circles  of  the  left  column  of  Figure  2-6)  being  equal  to  those  of  Table  2-1,  and  by 
the  estimated  FAW  being  1  for  all  cells.  However,  the  spatially-coupled  model  had  some 
difficulty  when  initial  conditions  were  unknown,  especially  with  the  FA  W  parameter 
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calibrated  in  wetter  years  and  wetter  (downslope)  cells  in  which  water  limitation  was  not 
a  major  problem. 

We  hypothesized  that  the  erratic  FA  W  estimates  under  unknown  initial  conditions 
corresponded  to  a  lower  sensitivity  of  the  FAW  parameter  relative  to  CN2,  as  expressed 
by  comparing  the  sensitivity  coefficient  (Hamby,  1994)  of  each  parameter  across 
different  years  and  landscape  positions.  Since  the  results  of  a  sensitivity  analysis  may  be 
greatly  dependent  on  the  chosen  base  case  (Atherton  et  al.,  1975;  Gardner  et  al.,  1981), 
we  used  the  base  case  defined  by  the  parameter  values  shown  in  Table  2-1,  so  the  results 
represent  the  behavior  of  the  parameters  around  the  optimal  parameter  estimate.  The 
coefficient  is  defined  as: 

*.*L.£  (2-19) 
'    AX,  Y 

where  Xj  is  the  base  case  value  of  the  ith  parameter,  AXt  is  an  deviation  of  the  parameter 
with  respect  to  its  base  case,  Tis  the  base  case  yield  value,  and  zlTis  the  yield  deviation 
corresponding  to  the  zLY,  deviation. 

Figures  2-8  and  2-9  show  sensitivity  results  at  different  landscape  positions  for 
CN2  and  FA  W,  respectively,  in  the  spatially-coupled  and  uncoupled  models.  Sensitivity  is 
weather  dependent,  somewhat  soil  property  dependent  (see  right  column  of  Figure  2-8), 
and  landscape  position  dependent.  Note  how  the  sensitivity  of  CN2  is  two  orders  of 
magnitude  greater  than  that  of  FA  W.  The  high  CN2  sensitivity  was  explained  previously; 
the  reasons  for  low  FA  W  sensitivity  are  linked  to  a  high  water  holding  capacity  of  the 
soil;  the  same  happens  for  KSAT.  Ferreyra  et  al.  (2001)  noted  how  when  using  the 
CERES  model  (Ritchie  et  al.,  1998)  in  the  Cordoba  region,  a  large  entry  of  water  into  the 
lower  soil  layers  happened  very  infrequently.  This  occurs  in  CERES  and  CROPGRO 
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because  these  models'  water  balance  simulation  does  not  move  water  downward  from  a 
layer  until  its  soil  water  content  has  surpassed  its  drained  upper  limit;  which  is  difficult  in 
soils  with  a  high  water  holding  capacity  and  high  runoff.  Consequently,  FA  W  can  affect 
neither  the  total  amount  of  water  available  to  the  crop,  nor  the  timing  of  its  availability 
around  a  base  case  in  which  FAW=\. 
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Figure  2-8.  CN2  sensitivity  coefficient  for  different  landscape  positions  and  models.  The 
points  and  whiskers  show  means  and  standard  deviations  of  coefficients 
calculated  for  the  6  years  of  the  unbiased  scenario  of  Figure  2-2. 


Figure  2-9.  FA  W  sensitivity  coefficient  for  different  landscape  positions  and  models.  The 
points  and  whiskers  show  means  and  standard  deviations  of  coefficients 
calculated  for  the  6  years  of  the  unbiased  scenario  of  Figure  2-2. 
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The  uncoupled  model  tried  to  compensate  its  lack  of  runon  contributions  by 
estimating  progressively  lower  CN2  values  downhill  (Figure  2-6).  According  to  Equation 
2-9,  this  would  result  in  less  runoff  losses,  and  hence,  more  infiltration.  However,  the 
uncoupled  model's  CN2  compensation  attempt  was  only  partially  successful  (Figure 
2-10);  the  error  increased  downhill.  As  demonstrated  above,  although  CN2  could 
conceivably  be  modified  for  each  cell  in  the  uncoupled  model  so  the  results  of  Equations 
2-9  and  2-13  coincide  for  a  given  storm  (or  the  yields  of  the  two  models  coincide  for  a 
given  year),  the  nonlinearity  of  Equation  2-13  with  respect  to  P  for  different  CN  values 
makes  it  practically  impossible  for  a  single  set  of  CN2  values  in  the  uncoupled  model  to 
reproduce  the  results  of  the  spatially-coupled  model  over  several  years. 
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Figure  2-10.  Yield  RJVISE  for  the  twelve  parameterization  scenarios  (6  years  per 

scenario).  Filled  circles  represent  the  spatially-coupled  model;  open  circles 
represent  the  uncoupled  model. 
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The  spatially-coupled  model  behaved  differently:  RMSE  was  0  for  all  cells  when 
initial  conditions  were  known.  This  was  expected  because  the  IM  algorithm  converged  to 
the  original  parameters.  However,  RMSE  increased  with  uncertain  initial  conditions, 
especially  for  the  wetter  cases. 

These  results  suggest  a  very  limited  capacity  of  the  uncoupled  model  to  predict 
reality,  especially  when  the  weather  years  used  for  parameter  estimation  are  similar. 
However,  the  spatially-coupled  model's  error  also  increased  under  uncertain  initial 
conditions.  We  explored  this  further  with  the  evaluation  data  set. 

Evaluating  With  Independent  Data 

Figures  2-11  to  2-14  show  the  RMSE  for  several  different  evaluation  scenarios. 

The  18  evaluation  years  were  split  by  TSW  tercile,  and  the  terciles'  results  were  shown  in 
separate  columns.  Each  point  represents  6  years. 

These  results  only  correspond  to  scenarios  having  uncertain  initial  conditions 
(i.e.,  the  values  shown  in  the  right  column  of  Figures  2-6  and  2-7)  since  measuring  the 
initial  conditions  at  the  parameter  estimation  phase  is  currently  not  practical  in  precision 
agriculture  modeling  applications  (R.  Murdock,  Pers.  Comm.).  For  reference,  however, 
the  RMSE  of  the  IM  scenario  using  the  spatially-coupled  model  and  full  knowledge  of 
initial  conditions,  evaluated  with  full  knowledge  of  initial  conditions,  is  zero  for  the  ten 
cells  in  all  combinations  of  IM  weather  cases  and  evaluation  TSW  tercile. 

When  the  evaluation  initial  conditions  are  known  (Figure  2-11),  the  prediction 
RMSE  of  the  spatially-coupled  model  is  minimum  for  the  dry-biased  IM  weather  case, 
increasing  towards  the  wet-biased  IM  case,  especially  for  the  cells  at  the  bottom  of  the 
toposequence  of  the  dry  tercile.  This  happens  because  under  very  dry  conditions,  crop 


37 


yield  responds  strongly  to  changes  in  infiltration,  initial  conditions  are  less  variable,  and 
the  CN2  parameter  is  consequently  estimated  more  accurately. 

Conversely,  the  wet-biased  weather  case  (bottom  right  of  Figures  2-6  and  2-7) 
produces  the  poorest  parameter  estimates  because  water  does  not  limit  the  crop's  growth 
in  some  of  the  years,  especially  in  the  lower  cells  (see  bottom  left  panel  of  Figure  2-5). 
The  parameter  estimation  process  thus  fits  parameters  to  explain  the  variability  of  initial 
conditions  rather  than  the  crop's  response  to  weather;  this  results  in  spurious  parameter 
values  (bottom  right  panels  of  Figures  2-6  and  2-7). 

The  spatially-coupled  model's  prediction  error  in  evaluation  simulations  increases 
when  the  initial  conditions  are  unknown  (Figure  2-12).  The  increase  is  most  noteworthy 
when  using  parameters  obtained  with  the  dry-biased  weather  case.  This  is  due  to  the 
impact  of  uncertainty  in  the  knowledge  of  initial  conditions,  which  explains  why  the  runs 
corresponding  to  the  central  tercile  of  evaluation  TSW  are  the  most  affected:  in  the  case 
of  the  dry  tercile  variability  of  the  initial  conditions  is  lower;  for  the  wet  tercile,  the 
impact  of  variability  of  initial  conditions  is  lower. 

Figure  2-13  shows  evaluation  results  for  the  uncoupled  model  under  perfect 
knowledge  of  initial  conditions.  The  patterns  are  similar  to  those  shown  in  Figure  2-10, 
with  error  increasing  downslope  as  the  decreased  curve  number  fails  to  properly  capture 
the  intra-annual  and  inter-annual  variability  of  spatial  water  movement  of  the  spatial 
model.  However,  for  the  wet-biased  IM  case,  the  prediction  error  downslope  is  actually 
less  than  in  the  spatially-coupled  model  (Figure  2-11),  because  errors  in  parameter 
estimation  cannot  compound  downslope  in  the  uncoupled  model. 
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Figure  2-11.  Evaluation  yield  RMSE  for  the  spatially-coupled  model  when  IM  initial 
conditions  are  unknown  and  evaluation  initial  conditions  are  known.  Each 
point  represents  six  years. 
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Figure  2-12.  Evaluation  yield  RMSE  for  the  spatially-coupled  model  when  both  the  IM 
and  evaluation  initial  conditions  are  unknown.  Each  point  represents  six  years. 
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Uncoupled  model,  perfect  evaluation  IC  knowledge 
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Figure  2-13.  Evaluation  yield  RMSE  for  the  uncoupled  model  when  IM  initial  conditions 
are  unknown  and  evaluation  initial  conditions  are  known.  Each  point 
represents  six  years. 
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2-14.  Evaluation  yield  RMSE  for  the  uncoupled  model  when  both  the  IM  and 
evaluation  initial  conditions  are  unknown.  Each  point  represents  six  years. 
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Figure  2-14  (uncoupled  model,  unknown  initial  conditions)  shows  results  similar  to 
those  of  Figure  2-13,  except  for  the  increased  prediction  RMSE  of  the  dry-biased 
scenarios  due  to  the  uncertainty  in  initial  conditions  already  mentioned  for  Figure  2-12. 
Ultimately,  the  result  shown  for  the  spatially-coupled  model  (Figure  2-12)  and  the 
uncoupled  model  (Figure  2-14)  are  very  similar,  suggesting  that  in  conditions  of 
uncertain  initial  conditions  and  biased  IM  weather  cases,  errors  in  the  spatial  coupling  of 
the  model  are  not  the  primary  cause  of  yield  prediction  error. 

The  spatially-coupled  model  has  several  caveats.  It  does  not  consider  subsurface 
flow;  it  calculates  runoff  using  the  SCS  method  (as  does  the  uncoupled  model)  that  has 
little  or  no  physical  basis  (Boughton,  1989);  it  assumes  that  runoff  leaves  the  field  in  one 
day;  it  does  not  consider  the  increasing  complexity  of  the  runoff  hydrograph  downslope 
as  runoff  contributions  from  uphill  cells  arrive  with  different  time  lags;  also,  since  it  adds 
all  the  runoff  from  a  cell  to  the  precipitation  of  its  immediate  downslope  neighbor,  this 
implies  that  it  is  assumed  that  runoff  will  travel  downslope  in  sheet  form.  However,  these 
simplifications  do  not  negate  the  effects  of  the  three  aforementioned  sources  of  error,  and 
thus  do  not  detract  from  the  central  findings  of  this  study. 

Conclusions 

The  literature  to  date  on  the  inverse  modeling  based  parameterization  of  spatial 
crop  models  is  dominated  by  uncoupled  models.  We  studied  three  possible  sources  of 
error  for  such  models:  model  error  from  lack  of  spatial  coupling  and  water  transport 
among  different  landscape  locations,  parameter  error  from  biased  weather  in  the  years  of 
yield  data  used  for  the  parameterization  process,  and  errors  due  to  lack  of  knowledge  of 
initial  soil  water  conditions.  Each  of  these  sources  of  error  impacted  spatiotemporal  yield 
prediction  capability. 
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With  respect  to  model  error,  we  showed  analytical  proof  that  the  spatiotemporal 
infiltration  behavior  of  a  spatially-coupled  water  balance  model  cannot  be  reproduced  by 
modifying  the  parameters  of  an  uncoupled  model.  The  corresponding  yield  prediction 
limitations  of  the  uncoupled  model  were  confirmed,  using  an  example,  both  at  the 
parameter  estimation  and  evaluation  stages. 

In  our  example,  however,  parameter  error  due  to  weather  biases  and  the  error  from 
lack  of  knowledge  of  initial  conditions  greatly  impacted  the  predictive  capability  of  the 
spatially-coupled  model,  and  had  less  effect  on  its  uncoupled  counterpart. 

Based  on  our  analysis  we  concluded  that  the  use  of  spatially-coupled  crop  models 
requires  high-quality  data.  Practical  precision  agriculture  applications  are  characterized 
by  uncertain  initial  conditions  and  the  possibility  that  the  weather  used  for  calibration  is 
not  representative.  Under  these  circumstances,  the  use  of  a  spatially-coupled  model  may 
not  be  justified,  especially  for  low  landscape  positions. 


CHAPTER  3 

PLANNING  CROP  SCOUTING  PATHS  WITH  OPTIMIZATION  ALGORITHMS 
AND  A  SELF-ORGANIZING  FEATURE  MAP 

Introduction 

In  the  context  of  decades  of  falling  commodity  prices,  climate  change,  and 
increasing  environmental  regulatory  pressure,  farmers  need  to  sustain  high  crop  yields 
and  incomes  year  after  year  in  order  to  survive.  Effective  risk  mitigation  requires  that 
farmers  make  crop  management  decisions  based  on  up-to-date  information. 

Crop  scouting  is  a  data-collection  activity  that  is  used  to  support  crop  management 
decisions  such  as  when  to  make  insecticide,  fungicide,  and  herbicide  applications.  A  crop 
scout  typically  walks  through  a  field  to  get  a  general  impression  of  its  state,  occasionally 
stopping  to  make  more  detailed  measurements.  The  kind  of  information  collected  by  crop 
scouts  depends  on  the  crop  in  question  and  the  decisions  to  be  made,  but  may  include 
qualitative  and  quantitative  assessment  of  the  presence  of  insects,  diseases,  weeds,  and 
water  stress. 

The  advent  of  precision  agriculture  (Pierce  and  Nowak,  1999)  and  precision 
integrated  pest  management  (Fleischer  et  al.,  1999)  has  brought  the  possibility  of  site- 
specific  applications.  It  is  increasingly  common  for  farmers  to  selectively  apply 
pesticides,  fertilizer,  lime,  and  other  products  to  areas  in  which  the  application  will 
maximize  profit.  This  in  turn  has  led  to  the  need  for  site-specific  scouting. 

Many  farmers  currently  accumulate  different  types  of  spatial  data  in  electronic 
format,  using  them  in  some  form  of  geographical  information  system  (GIS)  to  provide 
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information  about  the  spatial  variability  of  factors  that  affect  their  crops.  The  amount  of 
information  available  varies  with  the  time  since  the  farmer's  adoption  of  spatial 
technologies,  as  well  as  his/her  level  of  investment,  but  may  range  from  having  only  a 
field  boundary  to  multiple  years  of  yield  data,  electrical  conductivity,  elevation,  and 
multiple  soil  test  datasets  (NRC,  1997). 

Scouting  can  be  integrated  into  a  spatial  data  management  system,  using  the 
scouting  maps  together  with  existing  information  in  a  spatial  database  to  generate 
application  maps  (Nelson  et  al.,  1999).  Currently  many  crop  scouts  in  the  U.S.  record 
their  data  on  pre-printed  paper  forms  that  are  later  completed  and  faxed  to  the  farmer.  A 
crop  scout  will  typically  service  a  number  of  growers,  up  to  a  total  area  of  about  8000- 
1 0000  hectares,  charging  a  fee  per  unit  area.  Their  responsibility  varies  between  growers, 
from  making  all  prescriptions  and  supervising  subsequent  applications,  to  merely 
communicating  their  recommendations.  Some  scouts  focus  exclusively  on  insects;  others 
also  include  diseases  and  weeds.  The  price  per  unit  area  may  vary  an  order  of  magnitude 
depending  on  the  level  of  services  rendered. 

A  crop  scout  working  in  such  a  regime  needs  to  optimize  the  use  of  his/her  time; 
spending  too  much  time  per  unit  area  is  uneconomical,  and  spending  too  little  exposes  the 
scout  to  making  expensive  errors.  Moreover,  if  the  intent  is  to  describe  spatial  variability 
of  the  variable  of  interest  for  a  precision  agriculture  /  precision  IPM  application,  the 
placement  of  the  samples  becomes  especially  relevant  (Fleischer  et  al.,  1999). 

The  path  chosen  to  link  sampling  locations  strongly  influences  the  use  of  the 
scout's  time.  An  important  condition  to  be  met  by  this  path  is  that  it  must  be  closed  i.e. 
the  scouting  path  must  have  the  same  starting  and  ending  point.  This  condition  eliminates 
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the  downtime  required  for  the  scout  to  return  to  his/her  vehicle  from  a  distant  position  in 
the  field. 

In  general,  the  search  for  a  convenient  scouting  path  can  be  imagined  as  the 
combination  of  two  activities: 

•  Determining  the  sampling  locations,  and 

•  Finding  the  shortest  tour  (closed  path)  linking  all  of  the  locations. 

The  optimal  placement  of  sampling  sites  has  been  extensively  treated  in  soil 
science  (McBratney  and  Webster,  1981;  Burgess  and  Webster,  1984;  van  Groenigen  et  al. 
1999;  Ferreyra  et  al.  2002).  Optimizing  the  path  through  a  set  of  scouting  sites  has  not 
been  given  much  attention  in  the  agricultural  literature,  but  is  equivalent  to  a  classic 
problem  in  computer  science:  the  Traveling  Salesman  Problem  (TSP). 

The  goal  of  this  study  is  to  develop  objective  methods  for  solving  the  two  points 
shown  above,  i.e.,  sample  placement  and  scouting  path  construction,  to  build  scouting 
maps.  Two  possible  approaches  to  these  problems  are: 

1 .  Sequential,  in  which  the  optimal  sampling  locations  are  determined  first,  followed 
by  a  search  process  to  solve  the  associated  TSP,  and 

2.  Simultaneous,  in  which  sampling  points  and  the  tour  are  developed  simultaneously. 
Our  specific  objectives  were  to  apply  both  approaches  in  a  representative  case 

study  and  to  compare  their  performance  in  terms  of  predictive  error  and  practical 
applicability,  using  runtime  as  the  criterion  to  assess  the  latter. 

Theory 

Sequential  Approach:  Sampling  Locations 

The  search  for  an  optimal  sampling  location  network  depends  on  various  factors. 
Although  methods  exist  for  determining  the  optimal  number  of  samples  required  to 
represent  a  data  set  with  a  given  error  level  (Gath  and  Geva,  1989),  in  agricultural 
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practice  the  spatial  sampling  density  is  usually  determined  by  the  field's  surface  area  and 
economical  considerations.  Moreover,  there  may  be  few  layers  of  available  data.  An 
extreme  scenario  occurs  when  only  the  field  boundary  is  available,  digitized  from  a  map 
or  acquired  via  GPS.  This  data-poor  scenario  may  be  typical  of  many  farming  operations 
that  are  beginning  to  adopt  information  technology  and  precision  farming. 

In  site-specific  agriculture,  given  a  set  of  sampling  locations  across  a  field 
(sampling  scheme),  it  is  desirable  to  spatially  distribute  the  points  in  a  way  that  allows  the 
best  prediction  of  data  values  at  unsampled  locations  using  the  sampled  data.  This  is  an 
optimization  problem. 

Typically,  optimization  problems  involve  the  search  for  an  optimal  combination  of 
data  (sampling  locations,  in  our  case)  that  minimize  (or  maximize)  an  objective  function 
or  OF  (Winston,  1994).  In  the  data-poor  scenario  described  above,  a  suitable  OF  to 
minimize  is  the  Minimization  of  the  Mean  of  Shortest  Distances  (MMSD)  criterion 
defined  by  van  Groenigen  and  Stein  (1998).  The  MMSD  function  is  the  expectation  of 
the  distance  between  an  arbitrarily  chosen  point  within  the  study  region  and  the  sampling 
location  nearest  to  it.  For  large  sampling  regions  e.g.  an  infinite  plane,  this  criterion 
produces  an  equilateral  triangle  grid.  The  criterion  can  be  expressed  as  follows  (van 
Groenigen  and  Stein,  1998): 

*MMSD(S)  =  I%^  (3-D 

where  S  is  the  sampling  scheme  (set  of  sampling  locations),  M  is  the  total  number  of 
evaluation  points  composing  the  field,  xj  is  the  j*  evaluation  point,  and  d(xJ,s)is  the 
distance  between  point  xj  and  the  nearest  sampling  point.  It  is  assumed  that  the  evaluation 


46 

points  are  distributed  across  the  area  of  interest  on  a  finely  meshed  grid  (ten  meters,  for 
example). 

The  OF  must  be  combined  with  a  generation  mechanism  i.e.  a  method  to  search 
iteratively  for  progressively  better  solutions  to  the  problem.  A  powerful  such  method  is 
Simulated  Annealing  (Aarts  and  Korst,  1990),  a  combinatorial  optimization  algorithm 
that  has  been  applied  successfully  to  replace  exhaustive  searches  in  large  problems 
(Kirkpatrick  et  al.  1983)  and  is  insensitive  to  local  optima  in  the  OF,  unlike  more 
traditional  methods  such  as  gradient  descent.  Using  simulated  annealing,  the  sampling 
scheme  is  iteratively  perturbed  by  moving  a  randomly  selected  point  in  the  scheme  to  a 
new  random  location,  keeping  the  new  scheme  if  it  improves  on  the  previous  value  of  the 
OF,  and  rejecting  it  with  an  increasingly  higher  probability  if  it  does  not  improve  the  OF 
value.  Van  Groenigen  and  Stein  (1998)  developed  a  variant  of  this  method,  called  Spatial 
Simulated  Annealing,  which  differs  from  the  above  primarily  in  that  the  distance  that  a 
point  can  be  moved  during  a  perturbation  also  decreases  as  the  algorithm  progresses. 

A  sampling  scheme  for  the  data-poor  scenario  can  made  by  coupling  simulated 
annealing  with  the  MMSD  criterion  using  minimal  data:  a  field  boundary  to  make  a  raster 
map  of  the  field  interior  (the  evaluation  points). 

A  different  criterion  may  be  used  when  additional  information  is  available,  such  as 
a  semivariogram  of  the  spatial  random  variable  of  interest.  For  example,  if  the  scout's 
goal  is  to  estimate  crop  yield  throughout  the  field  from  values  of  yield  (or  a  proxy  such  as 
number  of  grains  per  unit  area)  measured  at  the  sampling  locations,  geostatistics  can 
provide  a  principled  optimal  solution  based  on  minimizing  kriging  variance  (van 
Groenigen  et  al.,  1999).  The  starting  point  is  a  spatial  covariance  model:  a  description  of 
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how  similarity  among  values  of  the  variable  sampled  at  different  locations  varies  with  the 
distance  between  the  locations.  Considering  a  stationary  spatial  random  variable  Z,  its 
semivariance  is  defined  as: 

Y(h)=ivar(z(u  +  h)-Z(u))  (3-2) 

where  u  is  a  location  in  space  and  h  is  a  given  displacement  away  from  it  (Deutsch  and 
Journel,  1992). 

Ordinary  Kriging  (Goovaerts,  1997)  is  a  popular  method  for  spatial  interpolation. 
For  an  arbitrary  point  u  in  the  region  of  interest,  the  estimated  value  of  the  variable  of 
interest  Z  is  the  weighted  sum  of  the  measured  values  of  Z  at  the  n  sampling  locations  ui . 
Thus, 

z(u)=£vz(u,)(3-3) 

1=1 

where  \j  are  the  weights,  determined  using  the  semivariogram  of  Z  and  assuming  a 
constant,  albeit  unknown,  expectation  E[z(u)]  =  m . 

The  error  or  kriging  variance  (KV)  for  ordinary  kriging  is  defined  as 

i=l 

where  \\i  is  a  Lagrange  multiplier  as  described  by  Webster  and  Oliver  (1990).  Kriging 
variance  depends  on  the  sampling  scheme  geometry  and  on  the  semivariogram,  but  is 
independent  of  actual  data  values  (Goovaerts,  1997).  It  is  zero  at  the  sampling  locations, 
and  increases  away  from  them.  It  can  be  used  in  an  objective  function;  for  example,  van 
Groenigen  (2000)  proposed  a  method  in  which  the  mean  KV  value  over  the  field  is 
minimized,  and  another  that  minimizes  the  maximum  KV  (MMKV). 
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Sequential  Approach:  the  Traveling  Salesman  Problem  (TSP) 

Imagine  a  salesman  who  has  to  travel  across  a  network  of  cities,  and  that  the 
distance  between  each  pair  of  cities  is  known.  The  TSP  consists  of  finding  the  shortest 
tour  that  will  visit  all  the  cities  (once)  and  return  to  the  starting  point;  it  may  seem  simple, 
but  no  efficient  solution  to  it  is  known.  The  TSP  forms  part  of  a  family  of  problems 
known  as  NP-complete  (Cormen  et  al.,  2001);  the  runtimes  of  known  solutions  to 
NP-complete  problems  are  exponential  functions  of  the  size  of  program  input  (the 
number  of  sampling  locations,  in  this  case).  Thus,  runtime  increases  dramatically  with 
increasing  input  size.  There  is  much  ongoing  research  on  the  TSP,  and  numerous 
approximate  solutions  have  been  postulated  for  it  (Golden  et  al.,  1980). 
The  Simultaneous  Approach 

The  Kohonen  self-organizing  feature  map,  or  SOFM  (Kohonen,  1982),  is  a  form  of 
neural  network  that  can  be  used  to  transform  high-dimensional  signal  pattern  inputs 
(such  as  several  layers  of  GIS  data  for  an  agricultural  field)  into  a  lower-dimensional 
representation  such  as  a  one-dimensional  scouting  path.  In  a  scouting  problem,  the  input 
data  are  vectors  corresponding  to  the  nodes  of  a  grid  (for  example,  with  ten-meter 
isometric  spacing)  overlaid  onto  the  field.  These  vectors  are  multidimensional;  each 
dimension  is  an  attribute  of  the  corresponding  location,  such  as  x,  y,  and  in  data-rich 
scenarios,  electroconductivity,  elevation,  slope,  past  yields  maps,  etc.  The  output  is  the 
scouting  path:  the  sequence  of  (sampling)  locations  to  visit.  These  locations  exist  in  the 
high-dimensional  space,  but  are  topologically  ordered  in  a  lower-dimensional  form  i.e.  a 
one-dimensional  sequence  (node  1,  node  2,  etc).  Since  the  input  vector  attributes  are 
expressed  in  different  units,  the  data  are  usually  normalized  in  order  to  make  the  distance 
calculations  meaningful. 
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The  SOFM  is  based  on  the  idea  of  competitive  learning.  Each  output  node  is 
represented  by  a  neuron  (a  vector  having  the  dimensionality  of  the  input  data),  and  the 
neurons  compete  for  the  input  data  vectors.  This  competition  is  based  on  distance 
(measured  in  the  high-dimensional  space)  between  the  input  data  and  the  neurons.  The 
algorithm  is  iteratively  presented  a  vector,  selected  randomly  from  the  input  data.  The 
distance  between  the  vector  and  each  of  the  neurons  is  evaluated,  and  the  neuron  that  is 
nearest  to  the  input  vector  is  declared  the  winner.  The  winning  neuron  is  subsequently 
rewarded  by  being  moved  towards  the  input  vector.  The  neurons  that  are  near  the 
winning  neuron  (this  nearness  is  measured  in  the  low  dimensional  space)  are  also  moved 
with  it  to  some  extent,  depending  on  the  value  of  a  neighborhood  function. 

Haykin  (1994)  defined  the  function  thus:  let  dj  i  denote  the  lateral  distance  of 
neuron  j  from  the  winning  neuron  i,  measured  in  the  low  dimensional  output  space  (such 
that  adjacent  neurons  would  have  a  distance  of  1).  Let  7ijj  denote  the  value  of  the 
neighborhood  function  centered  on  the  winning  neuron  i;  its  value  is  maximum  for 
djj  =  0,  and  must  tend  to  zero  as  djj  tends  to  infinity.  A  typical  function  used  for  this 
purpose  is  the  following  Gaussian: 


*u  =  exP 


f     d1  ^ 
jj 


(3-5) 


2*(nyy 

where  a(«)  is  the  effective  width  of  the  topological  neighborhood  after  n  iterations  of  the 
process.  In  each  iteration  the  neurons  are  updated  as  follows: 
j(n  + 1)  =  w,  (»)  +  n(n)njM{n\x{n) - w, («))  (3-6) 


w 
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where  Wj(n)  is  the  state  of  neuron  j  after  n  iterations,  n  is  the  learning  rate,  and  i(x)  is  the 
winning  neuron  corresponding  to  input  vector  x.  This  iterative  update  process  is  repeated 
thousands  of  times. 

The  learning-rate  parameter  r|  used  to  update  the  weight  vectors  and  the  effective 
width  of  the  neighborhood  function  a  should  decrease  as  the  algorithm  progresses, 
similar  to  the  cooling  process  described  earlier  for  simulated  annealing.  Ritter  et  al. 
(1992)  proposed: 


<t(«)  =  cr0  exp 
rj(n)  =  Tj0e\p 


n 


f  \ 
n 


(3-7) 
(3-8) 


where  do  and  r)o  are  the  function  values  when  n  =  0,  and  x0,  xn  are  time  constants  that 
determine  the  rate  of  decay. 

The  topological  ordering  property  results  from  the  update  Equation  3-6,  which 
forces  the  winning  neuron  to  move  toward  the  input  vector  x.  It  also  moves  the  weight 
vectors  of  the  nearby  neurons  contained  within  the  neighborhood  function.  Thus,  the  one- 
dimensional  incarnation  of  the  SOFM  can  be  visualized  as  an  elastic  band  containing  a 
sequence  of  nodes  that  exist  in  the  high-dimensional  input  space  (Haykin,  1994). 

Materials  and  Methods 
Case  Study  Problem,  Location,  and  Dataset 

The  case  study  is  a  yield  estimation  problem  in  a  data-poor  scenario  (as  previously 
defined).  This  is  not  an  obvious  application  of  scouting,  but  it  can  be  a  valuable  decision- 
support  tool  for  farmers  negotiating  futures  contracts.  In  such  a  situation,  yield  estimates 
would  be  made  prior  to  crop  maturity  using  a  proxy  such  as  grain  number  per  unit  area. 
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We  used  two  versions  of  the  sequential  approach  and  one  of  the  simultaneous  approach  (a 
total  of  three  different  methods,  henceforth  called  "cases")  to  propose  sampling  schemes 
and  tours  at  nine  different  sampling  densities  ranging  from  0.57  samples  /  ha  to  10 
samples  /  ha. 

Our  study  area  was  the  McCallon  1  field  near  Murray,  KY,  USA  (36°  32'  N, 
88°  27'  W,  elevation  222  m).  Its  surface  area  is  8.33  ha  (22.1  acres).  Soils  in  McCallon  1 
are  predominantly  somewhat  poorly  drained  Calloway  soils  (Glossaquic  Fragiudalfs)  and 
poorly  drained  Henry  (Typic  Fragiaqualfs)  soils.  Both  have  a  fragipan.  Available  data 
were  the  field  boundary,  maize  (Zea  mays  L.)  yield  maps  for  1999  and  2001,  and  an 
additional  yield  map  taken  in  the  1999  harvest  at  a  nearby  field  called  Suggs  4.  The  latter 
yield  map  was  used  to  provide  a  semivariogram;  we  assumed  its  spatial  covariance 
structure  could  be  representative  of  the  crop  in  other  years  and  in  similar  fields  such  as 
McCallon  1 ,  and  thus  be  useable  to  drive  the  MMKV  criterion  in  the  absence  of  actual 
previous  McCallon  1  yield  data. 

For  each  sampling  scheme,  maize  yield  data  were  obtained  by  averaging  all  the  raw 
yield  map  data  available  within  a  5  m  radius  of  each  sampling  location.  The  resulting  data 
were  used  to  estimate  the  yield  throughout  the  field  on  a  1 0  m  grid  of  evaluation  points, 
using  ordinary  point  kriging  as  detailed  further  below. 
The  Sequential  Approach 

We  used  the  SANOS  program  (van  Groenigen  and  Stein,  1998)  to  determine  the 
optimal  sampling  locations  using  both  the  MMSD  and  minimal  maximum  KV  (MMKV) 
criteria.  SANOS  is  a  versatile  program  that  can  design  sampling  schemes  in  complex 
domains.  It  can  accommodate  a  finite,  discontinuous  region  composed  of  arbitrarily 
shaped  subregions,  and  it  can  integrate  existing  sampling  locations  into  its  optimization. 
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The  runtime  of  the  spatial  simulated  annealing  algorithm  in  SANOS  is  user- 
specified,  but  an  optimal  value  can  be  calculated  within  SANOS  using  the  optimal  initial 
transition  probability  estimation  method  proposed  by  Aarts  and  Korst  (1990). 

For  the  TSP  we  used  the  3-Opt  algorithm  as  implemented  by  Syslo  et  al.  (1983). 
Although  it  is  not  guaranteed  to  produce  the  optimal  solution  to  any  given  TSP,  this 
algorithm  produces  high-quality  approximate  solutions  very  rapidly,  as  shown  by 
empirical  studies  such  as  that  of  Golden  et  al.  (1980). 

During  the  remainder  of  this  study,  we  refer  to  two  sequential  cases:  MMSD+TSP 
(using  SANOS  with  the  MMSD  criterion  and  using  the  3-Opt  TSP  solution),  and 
MMKV+TSP  (as  above  but  using  SANOS  with  the  MMKV  criterion).  In  both  cases  we 
used  the  field  boundary  to  build  a  domain  for  SANOS,  used  SANOS  to  propose  a 
sampling  scheme,  and  then  used  the  3-Opt  code  to  propose  the  scouting  tour.  We 
repeated  the  process  for  several  sampling  densities. 

With  respect  to  the  spatial  interpolation  step,  we  used  the  Suggs  4  semivariogram 
for  kriging  in  the  MMKV+TSP  case,  and  a  linear,  zero-nugget  semivariogram  (typically 
the  default  in  many  geostatistical  packages)  in  the  MMSD+TSP  case. 
The  Simultaneous  Approach 

The  third  case  under  study  was  a  variant  of  a  1-D  SOFM.  We  altered  the 
topological  neighborhood  function  to  force  the  SOFM  to  close  on  itself  i.e.  make  a  closed 
tour,  calculating  the  distance  d^j  as  follows: 

1  ifabs(j-i)>int(N/2) 

2  then  d  =  N  -  abs(j-i) 

3  else  d  =  abs(j-i) 

4  return  d 
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where  N  is  the  total  number  of  neurons.  Thus,  if  i,  j  are  the  first  and  last  neurons,  dj,j  is 
now  1  instead  of  N-l. 

The  field  boundary  was  converted  into  10-meter  raster  data  corresponding  to  the 
locations  inside  the  field  boundary.  These  833  x-y  pairs  were  used  as  input  to  the  SOFM 
algorithm.  Considering  the  possibility  of  the  results  being  parameter-dependent,  we  made 
21  realizations  of  the  SOFM,  keeping  three  parameters  constant  (n0  =  1,  xCT  =  20000, 
xn  =  20000)  and  varying  a0  from  0.5  to  1.0  in  steps  of  0.025, 

In  order  to  represent  typical  SOFM  solutions  in  the  subsequent  evaluation  of  the 
three  cases,  we  picked  the  SOFM  realization  having  the  median  MMSD  value. 

For  the  spatial  interpolation  step  we  used  a  linear  semivariogram  and  ordinary 
kriging  as  in  the  MMSD+TSP  case. 
Evaluation  of  Results 

The  three  cases  applied  at  9  different  sampling  densities  were  evaluated  according 
to  a)  MMSD  values,  b)  capability  of  predicting  spatial  yield  variability  as  expressed  by 
the  root  mean  squared  error  (RMSE)  between  observed  and  estimated  yield  maps  over 
the  10-meter  grid  of  (833)  evaluation  points,  c)  predictive  capability:  relative  error  in 
predicting  the  field  average  yield  calculated  from  the  833  evaluation  points,  and  d)  tour 
length.  The  observed  values  mentioned  in  point  (b)  were  obtained  by  fitting 
semivariograms  to  the  observed  yield  map  data  points  and  then  using  ordinary  kriging  to 
resample  the  observed  yield  map  data  onto  the  833-point  grid. 

To  explore  the  quality  of  the  TSP  solutions,  we  asked  an  expert  crop  consultant  to 
plot  what  he  felt  was  the  shortest  tour  through  the  different  sampling  schemes,  and 
compared  his  answers  with  the  solutions  provided  by  the  3-Opt  algorithm. 


54 

A  simulated  annealing  algorithm  can  be  run  for  an  arbitrary  duration,  or  until  a 
time-invariant  solution  is  found.  SANOS  specifies  runtime  as  proposed  by  Aarts  and 
Korst  (1990),  assuring  a  slow  "cooling"  of  the  system  and  maximum  rejection  of  local 
minima.  As  applied  in  SANOS  using  parameter  values  suggested  by  van  Groenigen  et  al. 
(1999),  the  algorithm's  runtime  is  about  2.5  and  4  hours  for  the  MMSD  and  MMKV 
criteria,  respectively,  on  a  Pentium  PC  with  a  1 .3  GHz  processor  speed.  This  is 
excessively  long  for  practical  applications,  so  we  explored  how  solutions  changed  if  the 
TSP  cases'  runtime  was  decreased  to  a  small  fraction  of  the  optimum  (1  min). 

Finally,  we  compared  the  semivariograms  from  the  1999  and  2001  observed  yield 
maps  in  the  McCallon  1  field  with  the  one  used  as  a  proxy  from  the  nearby  Suggs  4  field. 

Results  and  Discussion 

Sampling  Location  Layout 

Figures  3- 1  A,  3- IB  and  3-1 C  show  the  sampling  locations  and  scouting  tours 
obtained  for  a  sampling  density  of  2.5  samples/ha  (1/ac)  by  the  MMSD+TSP, 
MMKV+TSP,  and  SOFM  cases,  respectively.  The  points  are  spread  quite  evenly  over  the 
field,  although  MMKV+TSP  allocates  more  points  to  the  periphery  than  the  other 
methods.  The  optimal  tours  vary  greatly  between  cases,  and  are  not  easily  predictable  a 
priori  by  observing  the  sampling  schemes. 

Some  spatial  data-collection  problems  require  sampling  schemes  with  unevenly 
distributed  sampling  locations.  These  applications  generally  involve  prior  knowledge, 
such  as  the  areas  of  the  field  more  prone  to  fungal  diseases.  Another  such  application 
involves  planning  a  soil  sampling  strategy  that  ensures  all  soil  mapping  units  get 
sampled,  even  if  they  are  very  small.  These  kinds  of  stratified  sampling  are  easily 
implemented  in  the  MMSD-TSP  and  MMKV-TSP  cases  by  modifying  the  generation 
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mechanism  to  guarantee  that  sampling  locations  assigned  a  priori  to  a  mapping  unit 
remain  within  it. 

However,  implementing  stratified  sampling  in  the  SOFM  case  is  more  complex  due 
to  its  simultaneous  sample  placement  and  path  determination.  An  a  priori  assignment  of 
sampling  locations  to  polygons  in  the  SOFM  could  result  in  complicated,  suboptimal  path 
shapes,  unless  the  neighborhood  functions  and  other  algorithm  parameters  could  be 
updated  dynamically  during  operation,  as  in  the  Kalman-filter-driven  Auto-SOFM 
algorithm  (Haese  and  Goodhill,  2001). 
Predictive  Accuracy 

Figure  3-2A  shows  how  MMSD  varied  with  sampling  density  in  the  three  cases. 
The  MMSD+TSP  algorithm  consistently  produced  the  lowest  MMSD  values,  although 
there  was  little  variability  among  cases.  The  better  performance  of  the  MMSD+TSP  case 
was  expected,  since  MMSD  is  what  was  being  optimized  in  it. 

Observed  mean  corn  yields  and  standard  deviations  in  McCallon  1  were  8,406  and 
1,222  kg/ha  (CV  =  14.5%)  respectively  in  1999,  and  9,912  and  1,174  kg/ha 
(CV  =  1 1 .8%)  respectively  in  2001.  Yields  were  higher  and  less  variable  in  2001,  a  more 
favorable  weather  year. 
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Figure  3-1 .  Sampling  locations  and  tour  lengths  for  the  3  cases  at  a  sampling  density  of 
2.5/ha  (22-points):  (A)  MMSD+TSP:  1,433  m;  (B)  MMKV+TSP:  1,395  m; 
(C)  SOFM:  1,424  m. 
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Figure  3-2.  Evaluation  of  the  sampling  schemes  produced  by  the  three  cases  at  different 
sampling  densities:  (A)  MMSD,  (B)  yield  prediction  RMSE  for  1999, 
(C)  yield  prediction  RMSE  for  2001,  (D)  percent  error  predicting  1999  mean 
field  yield,  (E)  percent  error  predicting  2001  mean  field  yield. 


58 


B 


4044150 


4044100 


4044050 


4044000-B 


4043950 


370000  370050  370100  370150  370200  370250  370300  370350  370400 
4044150- 


4044100 


4044050 


4044000 


4043950 


370000  370050  370100  370150  370200  370250  370300  370350  370400 
4044150- 


4044100 


4044050 


4044000 


4043950 


11000 
10000 
H9000 
8000 
7000 
6000 
5000 
4000 
3000 
2000 

11000 
10000 
9000 
8000 
7000 
6000 
5000 
4000 
3000 
2000 

11000 
10000 
9000 
-  8000 
7000 
6000 
5000 
4000 
3000 
2000 


370000  370050  370100  370150  370200  370250  370300  370350  370400 

Figure  3-3.  (A)  Observed  1999  yield  map;  (B)  MMKV+TSP  estimate  at  2.5  samples/ha 
(22  points);  C)  MMKV  +  TSP  estimate  at  10  samples/ha  (88  points). 


Figures  3-2B  and  3-2C  show  estimated  yield  RMSE  vs.  sampling  density  for  1999 
and  2001.  RMSE  decreased  with  increasing  density,  and  varied  little  between  cases. 
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None  of  the  methods  was  clearly  superior  to  the  others  for  both  years  across  the  sampling 
density  range.  Figures  3 -2D  and  3-2E  show  relative  error  in  field  average  interpolated 
yield  vs.  sampling  density  for  1999  and  2001.  Error  was  generally  low;  average  (across 
cases  and  years)  relative  error  sr  was  below  5%  (less  than  28%  of  the  CV)  for  all 
sampling  densities,  a  great  improvement  with  respect  to  estimating  the  field  mean  yield 
from  one  random  location. 

Figure  3-3A  shows  the  observed  yield  map  for  1999.  Figure  3-3B  shows  the 
surface  interpolated  from  22  points  (sampling  density:  2.5/ha  «  1/acre)  obtained  with  the 
MMKV+TSP  algorithm.  Although  this  layout  estimated  mean  field  yield  with  an  error  of 
0.5%  in  1999  and  3.2%  in  2001,  it  reproduced  spatial  variability  relatively  poorly  (RJVISE 
of  1,064  kg/ha  in  1999  and  1,004  kg/ha  in  2001).  In  contrast,  Figure  3-3C  shows  the 
surface  obtained  with  the  same  algorithm  and  88  points  (density:  10/ha  «  25/acre).  The 
relative  error  of  prediction  of  the  mean  at  this  density  was  0.7%  in  1999  and  -0.3%  in 
2001  and  RJVISE  improved  to  742  kg/ha  in  1999  and  786  kg/ha  in  2001. 

Planning  a  sampling  scheme  and  scouting  path  using  multivariate  data  (of  which 
the  x,y  pairs  of  the  data-poor  scenario  are  a  special  case)  can  be  considered  an  attempt  at 
accurately  depicting  the  joint  probability  distribution  of  the  data  using  a  small  sample. 
The  SOFM  tends  to  bias  this  representation  by  overrepresenting  regions  of  low  input 
density  and  underrepresenting  regions  of  high  input  density  (Haykin,  1994).  This  may 
actually  be  valuable  in  crop  scouting,  where  small,  distinct  regions  in  the  input  data 
distribution  may  correspond  to  environmental  conditions  favoring  pests,  weeds,  etc. 


60 


Tour  Length 

Figure  3-4A  shows  tour  length  vs.  sampling  density  for  the  three  cases.  Tour 
lengths  were  quite  similar,  with  the  MMKV+TSP  case  producing  the  shortest  tour  at  five 
densities,  and  MMSD+TSP  producing  the  shortest  at  the  remaining  four.  The  SOFM 
produced  tours  that  were,  on  average,  4%  longer  than  the  average  of  the  other  cases. 

Fig.  3-4B  shows  the  difference  between  the  tour  lengths  resulting  from  the  TSP 
algorithm  and  the  expert.  Note  that: 

•  The  algorithm  -  and  expert-derived  tour  lengths  coincided  at  the  lowest  (5-point) 
sampling  density, 

•  The  algorithm-derived  tours  tended  to  be  increasingly  shorter  than  the  expert- 
derived  tours  as  sampling  density  increased,  and 

•  The  trend  was  stronger  for  MMKV  than  for  MMSD. 
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Figure  3-4.  (A)  Comparison  of  the  three  cases'  tour  lengths.  (B)  Difference  between 

MMSD  /  MMKV  case  tour  lengths  and  expert-derived  tour  lengths.  Note  how 
the  algorithm-derived  tours  tend  to  be  shorter,  and  how  the  difference 
increases  with  sampling  density. 
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Runtime 

Our  SOFM  algorithm  runs  quickly  (2-3  minutes),  stopping  when  it  reaches  an 
invariant  scheme.  Thus  we  did  not  try  time-constrained  SOFM  runs.  Likewise,  the  Syslo 
et  al.  (1983)  implementation  of  the  Opt3  algorithm  we  used  to  solve  the  TSP  component 
of  the  other  cases  ran  remarkably  fast.  The  spatial  sampling  design  stage  performed  with 
SANOS  was  where  time  reduction  was  necessary.  The  "optimal"  value  was  derived  from 
van  Groenigen  et  al.'s  (1999)  suggestion  of  using  a  conservative  value  (over  0.99)  for  a 
parameter  (called  a)  that  sets  how  quickly  the  simulated  annealing  algorithm  "cools". 

When  runtime  of  the  TSP  cases  was  constrained  to  only  1  minute  by  reducing  a, 
tour  length  decreased  an  average  of  3.5%  across  cases  and  years,  reflecting  the  more 
clumped  structure  of  the  suboptimal  schemes.  The  RMSE  remained  essentially  the  same, 
as  did  the  estimation  of  field  mean,  except  for  the  MMKV  cases,  where  estimation  error 
was  about  twice  that  of  the  time-unconstrained  cases.  The  latter  effect  results  from  the 
time  required  per  iteration  of  the  MMSD  and  MMKV  cases:  MMKV  iterations  involve 
solving  several  systems  of  linear  algebraic  equations  in  order  to  determine  the  weights  X, 
shown  in  equations  3-3  and  3-4.  Conversely,  MMSD  iterations  are  relatively  very  quick, 
only  requiring  arithmetical  comparisons.  Under  very  constrained  runtimes  there  may  not 
be  enough  iterations  of  the  MMKV  algorithm  to  attain  proper  equilibrium  of  the 
simulated  annealing  algorithm  at  each  acceptance  probability  level,  whereupon  the 
method's  rejection  of  local  minima  could  break  down.  Thus,  in  practical  time-critical 
applications,  the  MMSD+TSP  algorithm  is  preferable. 
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Semivariograms 

Table  3-1  shows  the  parameters  of  the  exponential  models  fitted  to  the  three 

semivariograms  derived  from  raw  yield  map  data.  The  nugget  (CO),  sill  (C0+C1)  and 

effective  range  (r)  values  differ  between  the  three  semivariograms.  It  has  been  noted  by 

van  Groenigen  (2000)  that  changes  in  variogram  parameters  can  impact  the  results  of  a 

sampling  scheme  based  on  minimizing  kriging  variance.  Thus,  although  MMKV+TSP  is 

based  on  sound  geostatistical  principles  whereas  MMSD+TSP  is  empirical,  the  advantage 

of  MMKV+TSP  when  using  a  proxy  semivariogram  is  questionable. 

Table  3-1.  Standardized  variograms  of  the  1999  and  2001  McCallon  1,  and  1999  Suggs  4 
maize  data.  Columns  are  the  nugget  effect  or  CO,  sill  or  C0+C1 ,  and  effective 
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CO 

C0  +  C1 
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McCallon  1  '99 

0.35 

0.75 

117 

McCallon  1  '01 

0.45 

0.556 

75 

Suggs  4  '99 

0.2 

0.8 
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Practical  Considerations 

Aside  from  improvement  with  respect  to  expert-derived  sampling  schemes  and 
paths,  using  one  of  the  proposed  methods  to  generate  a  scouting  map  has  several 
advantages  with  respect  to  a  paper-based  approach: 

•  Office-based  map  generation 

•  The  map  can  be  downloaded  to  a  GPS-enabled  handheld  computer  that  can  log 
results  and  ease  the  transfer  of  data  back  into  a  crop  database. 

•  Digital  scouting  tools  allow  the  farmer  to  keep  a  permanent  site-specific  record  of 
crop  state.  Thus,  scouting  maps  can  be  used  to  make  application  maps. 

•  Increased  accountability  of  scouting  performance:  files  contain  timestamps, 
location,  etc. 

•  Repeatedly  scouting  the  same  areas  can  allow  comparison  of  the  state  of  the  field 
over  time. 


63 


•  Greater  potential  for  delegation. 

•  It  is  possible  to  bias  the  sample-locating  process  to  increase  the  chances  of  finding 
pests  that  are  first  detected  in  certain  kinds  of  environments.  It  is  thus  possible  to 
avoid  (or  prioritize)  field  edges,  etc. 

Potential  drawbacks  include: 

•  Additional  hardware  /  software  requirements. 

•  The  requirement  of  following  a  set  path  may  potentially  be  less  cost-effective 
(more  time-consuming)  for  the  crop  scout. 

•  The  learning  process  of  using  new  technology. 

Conclusions 

The  methods  shown  herein  provide  a  principled  approach  to  the  design  of  crop- 
scouting  activities  as  a  form  of  spatial  sampling.  The  methods  are  sufficiently  quick  and 
accurate  to  be  usable  in  practical  applications. 

The  TSP  methods  (MMKV+TSP  and  MMSD+TSP)  tended  to  make  slightly  shorter 
tours  than  the  SOFM,  although  the  three  methods'  tours  were  never  longer  than  the 
expert  opinions.  The  TSP  methods  also  typically  estimated  yield  slightly  better  than  the 
SOFM.  When  runtime  is  unconstrained  (and  a  semivariogram  is  available),  the 
MMKV+TSP  case  seems  most  appropriate.  Contrarily,  when  runtime  is  strongly 
constrained  MMSD+TSP  may  be  more  dependable.  In  intermediate  situations  the  three 
methods  are  practically  equivalent. 


CHAPTER  4 

REDUCING  SOIL  WATER  SPATIAL  SAMPLING  DENSITY  USPNG  SCALED 
SEMIVARIOGRAMS  AND  SIMULATED  ANNEALING 

Introduction 

Estimating  the  spatial  and  temporal  patterns  of  soil  water  content  in  agricultural 
areas  is  of  great  value  in  various  activities  such  as  predicting  of  crop  yields,  assessing  the 
fate  of  potentially  contaminating  crop  inputs,  and  estimating  soil  erosion.  The  necessary 
data  requirements  must  be  met  through  spatial  sampling,  and  the  spatial  density  of  the 
measurements  will  strongly  influence  the  cost  of  the  process,  the  quality  of  the  results, 
and  the  feasibility  of  a  long-term  study.  Careful  design  of  the  sampling  scheme  can  save 
time  and  money,  and  spatial  statistics  may  be  used  to  optimize  such  a  scheme  (Van 
Groenigen  and  Stein,  1 998).  The  density  reduction  of  an  existing  spatial  network  is  a 
related  problem,  relevant  in  many  regions  of  the  world  where  funding  for  environmental 
monitoring  is  decreasing. 

The  spatial  dataset  used  in  this  study  was  taken  from  an  ongoing  experiment 
running  since  1992  in  an  8-hectare  microwatershed.  The  spatial  variability  of  soil  water 
content  was  characterized  by  repeatedly  sampling  soil  water  at  57  locations  distributed 
throughout  the  microwatershed,  but  it  was  impractical  to  sample  all  the  points  with  the 
desired  temporal  frequency  over  an  extended  period  of  time.  It  became  necessary  to 
develop  a  methodology  to  reduce  the  number  of  sampling  points  while  maintaining  the 
ability  to  describe  spatial  soil  water  content  across  the  field.  Our  goal  was  to  identify  a 
time-invariant  relationship  between  the  water  content  measurements  across  the  57 
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locations.  The  existence  of  such  a  relationship  would  allow  us  to  infer  the  spatial  pattern 
of  soil  water  content  over  the  entire  microwatershed  at  future  dates  by  sampling  a 
reduced  subset  of  locations. 

Temporal  Stability 

Formally,  the  concept  of  a  time-invariant  relationship  between  water  contents  at 

different  locations  may  hold  only  for  covered  plots  draining  freely;  however,  there  is 
evidence  that  it  has  a  wider  range  of  application  (Sisson,  1987).  Vachaud  et  al.  (1985) 
developed  a  technique  for  reducing  spatial  sampling  density,  based  on  the  concept  of 
temporal  stability  of  soil  water  content.  They  defined  this  as  a  time  invariant  association 
between  spatial  location  and  classical  statistical  parameters,  emphasizing  the  persistence 
of  the  rank  of  soil  water  content  measured  at  different  locations  in  a  network.  This  idea 
has  been  subsequently  tested  under  different  conditions  by  several  authors,  with 
contradictory  results  as  shown  below. 

The  work  reported  by  Vachaud  et  al.  (1985)  involved  data  sets  that  showed  no 
spatial  correlation,  perhaps  due  to  the  great  heterogeneity  of  the  soil  properties  in  the 
study  locations.  This  sample  independence  made  it  possible  to  study  the  temporal 
stability  of  the  data  using  simple  statistics.  However,  at  many  scales  of  interest,  soils  are 
not  necessarily  randomly  distributed;  Kachanoski  and  de  Jong  (1988)  presented 
additional  tests  of  temporal  stability  in  the  context  of  spatial  associations  in  the  data  and 
scale  dependency.  Other  researchers  have  observed  temporal  stability  in  soil  water 
patterns:  Goovaerts  and  Chiang  (1993)  in  a  long-term  fallow  plot,  Kamgar  et  al.  (1993)  in 
bare  soil  laid  out  in  furrows  and  beds,  Zhang  and  Berndtsson  (1988)  on  short-cut  grass, 
Jaynes  and  Hunsaker  (1989)  in  an  irrigated  wheat  field,  and  Reichardt  et  al.  (1993)  under 
different  conditions  of  land  cover  ranging  from  bare  soil  to  a  corn  crop. 
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Other  studies  produced  mixed  results.  Cassel  et  al.  (2000)  observed  greater 
temporal  stability  of  water  content  in  deeper  soil  layers  than  in  shallow  layers  under  a 
wheat  crop.  This  effect  could  be  attributed  to  the  impact  of  crop  root  water  uptake. 
Grayson  and  Western  (1998)  studied  three  catchments  having  significant  relief,  and 
observed  that  although  the  overall  spatial  soil  moisture  patterns  were  not  time  stable,  the 
measurements  in  a  specific  subset  of  the  locations  within  the  measurement  network  were 
time  stable  and  could  adequately  represent  mean  soil  moisture  over  their  areas  of  interest. 
Grayson  and  Western  (1998)  denoted  the  locations  in  this  subset  as  catchment  average 
soil  moisture  monitoring  (CASMM)  sites.  Comegna  and  Basile  (1994)  obtained  opposite 
results:  they  observed  both  spatial  associations  between  water  content  across  locations, 
and  a  time-stable  spatial  structure  for  the  water  content  in  the  top  90  cm  of  the  soil 
profile.  However,  they  were  unable  to  find  CASMM  locations,  and  attributed  this  effect 
to  the  great  homogeneity  of  the  volcanic  soil  of  their  study  site. 

Other  researchers  have  observed  a  lack  of  temporal  stability.  Van  Wesenbeeck  et 
al.  (1988)  observed  that  the  spatial  pattern  of  surface  (0-0.2  m)  soil  water  content  below  a 
corn  crop  was  not  stable  over  time,  but  was  a  function  of  crop  growth  stage  and  mean  soil 
water  content.  Mohanty  et  al.  (2000)  observed  time  instability  of  soil  moisture  patterns  in 
a  gently  sloping  range  field,  and  suggested  that  this  might  be  the  consequence  of  lateral 
base  flow  and  aspect-driven  accelerated  or  decelerated  evapotranspiration  and 
condensation.  Indeed,  Kachanoski  and  De  Jong  (1988)  pointed  out  that  soil  water  content 
at  a  point  is  the  product  of  hydrologic  processes  operating  at  different  spatial  scales. 

Variogram  analysis  has  been  used  very  effectively  to  study  spatial  associations 
(Vieira  et  al.,  1983;  McBratney  and  Webster,  1981;  Burgess  and  Webster,  1980). 
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However,  in  the  context  of  temporal  analysis,  seasonal  differences  in  rainfall  and  water 
content  in  a  field  may  render  a  variogram  calculated  at  one  time  not  representative  of  the 
conditions  at  another.  Kachanoski  and  De  Jong  (1988)  noted  that  time  stability  results  in 
time  independence  of  the  normalized  semivariogram  but  not  necessarily  the  ordinary 
semivariogram.  Vieira  et  al.  (1991)  proposed  a  variogram  scaling  technique,  dividing  the 
variogram  of  the  observations  taken  at  a  particular  date  by  their  sample  variance,  and 
subsequently  merging  several  dates'  variograms  into  one.  Comegna  and  Basile  (1994) 
and  Vieira  et  al.  (1997)  applied  this  concept  to  time  stability  analysis  of  soil  water 
content. 

Simulated  Annealing 

The  spatial  sampling  density  reduction  problem  requires  selecting  a  subset  of  the 
original  dataset  that  will,  in  combination  with  a  spatial  interpolation  algorithm,  produce 
the  best  possible  estimate  of  the  variable  of  interest  at  the  points  that  will  no  longer  be 
sampled.  This  is  a  nontrivial  combinatorial  problem  when  the  number  of  locations 
involved  is  high.  An  optimization  algorithm  may  be  used  to  search  for  a  solution,  but  the 
algorithm  in  question  should  converge  to  the  global  optimum.  Simulated  annealing  (Aarts 
and  Korst,  1990)  is  such  a  method;  different  forms  of  the  algorithm  originally  proposed 
by  Metropolis  et  al.  (1953)  have  been  recently  applied  to  numerous  problems  such  as 
modeling  spatial  variability  of  heavy  metal  concentration  in  soils  (Lin  and  Chang,  2000), 
spatial  variability  of  phosphorus  content  and  texture  (van  Groenigen  et  al.,  1999),  soil 
pore  structure  modeling  (Moran  and  McBratney,  1 997),  and  soil  parameter  estimation  for 
functional  crop  models  (Calmon  et  al.,  1999;  Braga  and  Jones,  1998;  Braga  et  al.,  1998; 
Shen  et  al.,  1998;  Paz  et  al.,  1998).  Examples  of  combinatorial  applications  of  simulated 
annealing  range  from  printed  circuit  board  design  (Kirkpatrick  et  al.,  1983)  to  the 
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selection  of  representative  nodes  in  a  meteorological  network  (Robledo,  1994)  and  the 
determination  of  optimal  soil  sampling  strategies  for  precision  agriculture  research  (Van 
Groenigen  et  al.,  2000). 

Our  objectives  in  this  study  were  a)  to  model  the  spatial  variability  of  soil  water 
across  several  dates  in  an  8  ha  micro  watershed  using  measurements  taken  at  57  locations, 
and  b)  to  define  a  reduced  subset  of  1 0  of  the  original  measurement  network  locations 
which  could  be  used  to  adequately  predict  the  soil  water  content  in  the  rest  of  the 
network. 

Materials  and  Methods 

Study  Location 

Our  study  area  is  an  8-hectare  microwatershed  (64°  13'  W,  31°  29'  S)  located  25 
km  to  the  south  of  the  city  of  Cordoba,  Argentina.  The  conditions  in  this  field  are 
considered  representative  of  approximately  20000  hectares  affected  by  water  erosion  in 
the  region  (Romero  et  al.,  1995).  A  map  of  the  microwatershed  including  elevation  level 
curves  is  shown  in  Figure  4-1.  Slope  in  the  microwatershed  varies  from  0.8  to  1.2%,  and 
runoff  is  discharged  through  a  flume  located  at  its  southeastern  corner  (coordinates  x  =  0, 
y  =  280  in  Figure  4-1).  The  soil  is  a  silty  loam  Typic  Haplustoll.  A  modal  horizon  profile 
is  Al  (0-14  cm),  A2  (14-20  cm),  Bw  (20-40  cm),  BC  (40-60  cm),  C  (60-84  cm),  and  Ck 
(84+  cm) .  The  soil  is  very  deep,  and  the  depth  to  the  water  table  is  approximately  20 
meters.  Soybeans  (Glycine  Max  (L.)  Merrill)  were  grown  on  the  microwatershed  under 
conventional  tillage  every  year  since  1 990.  An  Agripo  maturity  group  VII  variety  was 
used  in  the  1991  /  92  season  and  an  Asgrow  maturity  group  VI  variety  was  used 
thereafter. 
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Figure  4-1.  Layout  of  the  micro  watershed,  showing  the  sampling  locations  as  numbered 
crosses.  Coordinates  are  expressed  in  meters.  Note  the  rotated  map 
orientation. 

The  measurement  layout  consisted  of  a  grid  pattern  of  57  locations  covering  the 
entire  microwatershed,  as  shown  in  Figure  4-1 .  The  grid  had  an  isometric  interval  of 
41.66  m  between  adjacent  points.  Gravimetric  soil  water  content  measurements  were 
performed  in  each  of  the  grid  points  at  depths  of  0-30,  30-65,  and  65-100  cm,  and  these 
values  were  used  to  estimate  the  total  soil  water  content  to  a  depth  of  1  m.  This  study 
used  measurements  performed  on  2/7/1992,  2/24/1992,  3/20/1992,  1/25/1993,  and 
12/23/1993.  The  first  three  dates  were  used  to  develop  and  calibrate  a  model  of  spatial 
variability;  and  the  last  two,  belonging  other  cropping  seasons,  were  used  for  validation. 
Semivariogram  Modeling 

We  analyzed  the  spatial  variability  of  the  soil  water  content  of  each  soil  layer  and 
of  the  total  water  content  in  the  first  meter  of  the  soil  profile  using  variogram  modeling 
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(McBratney  and  Webster,  1986;  Trangmar  et  al,  1985;  Vieira  et  al.,  1983).  We 
calculated  an  experimental  semivariogram  for  each  measurement  date  and  fitted  a 
continuous  function  to  it.  The  optimal  semivariogram  type  was  selected,  and  the 
semivariograms  were  tested,  using  cross-validation.  Apezteguia  et  al.  (1999)  reported 
these  results. 

For  validation  we  used  a  scaled  semivariogram  (Vieira  et  al.,  1997),  built  by 
dividing  the  experimental  semivariograms  of  each  of  the  three  calibration  dates  by  the 
sample  variance  of  each  date's  data,  and  fitting  a  new  continuous  function  to  the  union  of 
all  the  scaled  data.  Its  usage  will  be  described  under  Validation  below. 

The  Density  Reduction  Problem 

The  sampling  density  reduction  problem  consisted  of  choosing  the  subset  of  given 

cardinality  of  the  57  measurement  locations  which  would  best  approximate  the  spatial 
distribution  of  soil  water  content  at  the  calibration  dates,  using  the  data  measured  at  the 
subset  to  estimate  the  data  at  the  remaining  points  by  means  of  a  spatial  interpolation 
algorithm.  The  optimization  process  was  formulated  as  the  minimization  of  an  objective 
or  fitness  function  J  that  will  be  discussed  below.  Long-term  sampling  cost 
considerations  set  the  subset  cardinality  to  10. 

Choosing  an  optimal  subset  of  10  points  out  of  57  is  a  complex  combinatorial 
problem;  using  a  brute-force  method  to  evaluate  all  the  possible  combinations  would  be 
very  time-consuming  given  that  C(57,10)  =  4.318  •  1010  and  each  iteration  is 
computationally  intensive.  Instead,  we  approached  the  problem  using  simulated  two 
annealing  algorithms:  the  one  described  by  Sacks  and  Schiller  (1988),  and  the  newer 
Spatial  Simulated  Annealing  algorithm  proposed  by  van  Groenigen  and  Stein  (1998). 
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The  Sacks  and  Schiller  algorithm  (S&S)  is  designed  to  work  on  a  discrete  domain 
D  (57  points,  in  our  case).  At  any  given  time  j,  SJ  is  a  subset  of  D  of  the  desired 
cardinality  (10).  The  algorithm  iteratively  proposes  and  evaluates  a  new  subset  by 
replacing  one  point  from  the  previous  subset,  and  accepts  or  rejects  the  change  according 
to  the  application  of  a  simple  acceptance  criterion.  In  each  iteration  the  new  pattern  S'  is 
proposed  by  randomly  choosing  an  entering  point  t  €  (d-Sj),  followed  by  the 
deterministic  selection  of  the  replaced  exiting  point  s*  e  SJ  that  minimizes  the  fitness 
function  J(S')  i.e.  J(Sj  u t - s* )  =  min J(SJ  ut-s).  After  each  such  change,  the  new 

seSJ 

value  of  J,  J(S')  may  or  may  not  have  improved  (decreased)  with  respect  to  the  previous 
iteration.  If  it  improved,  then  the  new  pattern  is  accepted  with  a  probability  of  1 .  If  it  did 
not  improve,  then  the  pattern  is  accepted  with  a  probability  given  by  a  control  parameter 
71,  such  that  0  <  7i  <  1 ,  and  n  is  a  function  that  tends  to  decrease  through  the  algorithm's 
execution,  making  it  progressively  more  improbable  that  the  algorithm  accepts  new 
patterns  that  do  not  improve  the  solution. 

The  Spatial  Simulated  Annealing  algorithm  (SSA)  is  different  from  the  previous 
algorithm  in  three  fundamental  aspects:  i)  it  is  designed  for  a  continuous  domain,  ii) 
instead  of  only  using  the  control  parameter  to  set  the  acceptance  probability,  it  also 
includes  the  difference  in  fitness  between  the  new  and  old  patterns,  and  iii)  instead  of 
replacing  a  point  of  the  subset  SJ  with  another  one  belonging  to  (d-Sj  ),  it  chooses  a 
point  s  within  the  subset  and  moves  it  over  space  to  a  new  location  shifted  with  respect  to 
the  original  in  a  random  direction  and  by  a  random  distance,  the  latter  bounded  from 
above  by  a  function  hmax  that  tends  to  decrease  as  the  algorithm  execution  progresses. 
Thus,  s  may  initially  be  shifted  large  distances,  but  as  the  algorithm  progresses  the 
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movements  become  progressively  smaller  and  less  probable.  In  a  manner  similar  to  S&S, 
the  SSA  control  parameter  decreases  with  time. 

The  nature  of  our  problem  did  not  allow  the  use  of  the  SSA  algorithm  as  originally 
described  by  van  Groenigen  and  Stein  (1998);  it  was  only  possible  to  consider  locations 
from  the  discrete  domain  of  57  points  due  to  the  existence  of  several  years  of  other  data 
(crop  biomass,  yield,  etc.)  sampled  only  at  those  locations.  We  implemented  a  variation 
of  the  SSA  algorithm  that  moved  location  s  over  space,  but  only  to  candidate  locations  on 
the  grid.  Otherwise,  the  algorithm  is  almost  identical  to  the  one  presented  by  van 
Groenigen  and  Stein.  A  detailed  description  of  both  implemented  algorithms  (S&S  and 
SSA)  is  provided  in  the  Appendix. 
Fitness  Functions 

In  each  iteration  of  the  simulated  annealing  algorithms,  we  used  a  fitness  function 
to  describe  the  ability  of  the  proposed  pattern  S'  to  predict  the  water  content  throughout  D 
at  all  the  dates  of  interest.  The  prediction  of  water  content  was  performed  with  a  spatial 
interpolation  algorithm.  We  applied  ordinary  kriging  (Deutsch  and  Journel,  1992),  using 
each  calibration  date's  calculated  semivariogram.  We  performed  the  process  with  two 
different  fitness  functions,  both  of  which  simultaneously  evaluated  the  performance  of 
candidate  subsets  across  all  the  calibration  dates.  The  functions  are  described  below. 
Scaled  kriging  variance  (SKV) 


where  N  is  the  total  number  of  date  -  space  combinations,  slightly  less  than  3  ■  57 
(calibration  set)  or  2  •  57  (validation  set)  due  to  the  existence  of  missing  data.  The  SKV 
function  adds  the  kriging  variance  of  water  content  across  all  the  points  i  of  the 


The  scaled  kriging  variance  function  is  defined  as  SKV  = 
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microwatershed  and  across  all  the  dates  of  interest  j,  scaling  it  by  the  variance  of  the 
observed  water  content  data  across  the  microwatershed  on  the  corresponding  date. 
Provided  that  the  intrinsic  hypothesis  of  geostatistics  is  valid,  the  predictive  accuracy  of 
ordinary  kriging  can  be  expressed  by  the  kriging  variance  (van  Groenigen,  2000).  Thus, 
finding  a  solution  that  minimizes  the  kriging  variance  (or  the  SKV  function,  in  this  case) 
can  be  expected  to  maximize  predictive  accuracy. 
Scaled  mean  squared  error  (SMSE) 

i        (e  -e  )2 

The  scaled  mean  squared  error  function  is  defined  as  SMSE  =  —  •  Y  Y     — . 

This  function  adds  the  error  of  prediction  of  water  content  across  all  the  points  i  of  the 
microwatershed  and  across  all  the  dates  of  interest  j,  scaling  the  square  of  each  residual 
by  the  variance  of  the  observed  water  content  data  across  the  microwatershed  on  the 
corresponding  date.  This  allowed  us  to  combine  errors  across  different  dates. 

We  explored  the  four  possible  scenarios  defined  by  combinations  of  the  two  fitness 
functions  (SKV,  SMSE)  and  two  algorithms  (S&S,  SSA).  We  ran  five  repetitions 
(instances)  per  scenario,  differing  in  their  initial  conditions  and  in  the  random  numbers 
used  throughout  the  process.  The  corresponding  parameters  are  shown  in  the  Appendix. 
Validation 

The  optimal  subset  of  D  was  used  to  estimate  the  water  content  in  the  top  one  meter 
of  soil  on  January  25,  1993  and  December  23,  1993.  These  two  dates  had  not  been  used 
in  the  calibration  process.  As  in  the  calibration  phase,  we  estimated  water  content  in  the 
47  points  not  in  the  subset  using  ordinary  kriging  as  a  spatial  interpolator.  However,  we 
used  the  scaled  semivariogram  multiplying  its  nugget  effect  and  scale  by  the  variance  of 
the  data  observed  in  the  optimal  10-point  subset  under  consideration. 
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We  evaluated  the  optimal  subsets  obtained  from  the  four  algorithm  /  fitness 
function  combinations  (SKV-S&S,  SKV-SSA,  SMSE-S&S,  SMSE-SSA).  In  order  to 
observe  the  benefits  of  applying  our  proposed  method,  we  also  evaluated  three  regular 
grids  (Table  4-1)  and  132  randomly  generated  subsets.  We  calculated  relative  errors 

0.  —0. 

_  _y — y_.  j00o/o       standardized  residuals  for  each  location  and  validation  date,  tested 

hi 

for  bias,  and  plotted  the  standardized  residuals  vs.  the  estimated  total  soil  water  content 
for  each  validation  date  to  check  for  trends  in  the  estimation.  We  also  calculated  the 
Shapiro- Wilk  W  statistic  (Shapiro  &  Wilk,  1965)  to  verify  whether  the  residuals  were 
normally  distributed,  and  tested  for  heteroscedasticity  using  regression  between  the  mean 
estimated  soil  water  content  and  the  variance  of  4-point  clusters  of  adjacent  points, 
following  Goovaerts  (1997).  In  order  to  verify  compliance  with  kriging  assumptions,  we 
calculated  histograms  of  the  residuals  divided  by  their  respective  kriging  standard 
deviation,  and  checked  for  normality,  zero  mean,  and  unit  variance. 

To  put  the  results  into  context,  we  also  checked  for  temporal  stability  as  defined  by 
Vachaud  et  al.  (1985),  using  temporal  analysis  of  the  differences  between  individual  and 
spatial  averages,  and  performing  Spearman's  rank  correlation  on  the  data  of  all  possible 
pairs  of  the  five  available  measurement  dates. 

Table  4-1 .  Locations  contained  in  the  most  relevant  patterns  mentioned  in  the  text, 

together  with  their  values  of  scaled  mean  squared  error  (SMSE)  and  scaled 
kriging  variance  (SKV)  over  the  calibration  (subscript  c)  and  validation 

 (subscript  v)  data  sets.  


Pattern 

Si 

S2 

S3 

s4 

Ss 

S6 

S7 

S8 

s9 

SlO 

SKVC 

SKVV 

SMSEC 

SMSEv 

Best  SKV-based 

9 

11 

13 

22 

25 

34 

37 

43 

49 

55 

.3276 

.1441 

.7559 

.6336 

Best  SMSE-based 

5 

7 

13 

16 

30 

37 

41 

50 

53 

56 

.3844 

.4268 

.3635 

.5174 

Regular  grid  #1 

3 

7 

15 

19 

29 

37 

40 

47 

51 

57 

.3467 

.1723 

.7537 

.7117 

Regular  grid  #2 

4 

7 

8 

23 

26 

32 

40 

45 

53 

56 

.3549 

.2072 

.6187 

.5286 

Regular  grid  #3 

7 

8 

11 

27 

30 

41 

44 

51 

53 

56 

.3590 

.2594 

.7297 

.6814 

Best  random  grid 

5 

9 

13 

17 

28 

33 

34 

45 

55 

57 

.3731 

.3730 

.5042 

.5040 
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Results  and  Discussion 

Semivariograms 

For  each  date  of  interest,  the  semivariogram  model  that  produced  the  best  fit  was  an 
exponential  model.  The  corresponding  parameters  are  shown  in  Table  4-2.  Due  to  the 
unavailability  of  data  pairs  with  lag  distances  below  41 .66  meters,  it  was  difficult  to 
assess  the  existence  of  a  nugget  effect.  We  assumed  a  zero  nugget  throughout,  based  on 
the  great  similarity  among  the  repetitions  of  water  content  measurements  that  were 
pooled  for  each  point  at  each  measurement  date  (not  shown).  The  scaled  semivariogram 
model  coalesced  from  the  three  calibration  set  semivariograms  was 
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l-e 


y(h)  =  C„+C, 
(effective  range). 


,  with  Co  =  0  (nugget),  C,  =  0.1082  (scale),  and  a  =  176  m 


Table  4-2.  Mean  total  soil  water  content  in  the  first  meter  of  soil,  phenological  stage,  and 
semivariogram  model  parameters  per  measurement  date,  and  parameters  for 
the  scaled  semivariogram  model.  

Semivariogram 


Date  e. 

parameters 

Phenological  stage 

Co 

c, 

a 

Feb  7  1992  217  mm 

V4  (4tn  node  on  main  stem) 

0 

520 

110m 

Feb  24  1992  228  mm 

tfl 

V6  (6  node  on  main  stem) 

0 

650 

130  m 

Mar  20  1992  253  mm 

R5  (Beginning  seed) 

0 

1115 

130  m 

Jan  25  1993  196  mm 

R2  (Full  flower) 

0 

270 

135  m 

Dec  23  1993  142  mm 

Planting 

0 

320 

100  m 

Scaled  SV  N/A 

N/A 

0 

1.082 

176  m 

Density  Reduction 

Figure  4-2  shows  the  progress  of  the  five  instances  of  one  of  the  scenarios  (SMSE, 
SSA).  Note  how  the  length  of  the  process  varied  among  instances,  but  the  final  value  of 
the  fitness  function  they  all  arrived  at  was  approximately  the  same.  This  behavior  was 
consistent  among  all  scenarios.  However,  in  both  the  SKV  and  SMSE-based  scenarios 


76 

the  five  instances  of  the  S&S  algorithm  reached  their  optimum  value  faster  than  any  of 
the  SSA  instances  (not  shown),  probably  due  to  the  S&S  algorithm's  deterministic 
selection  of  the  exiting  point  of  the  subset.  This  characteristic  makes  the  S&S  algorithm 
susceptible  to  converge  towards  local  minima  when  n  has  reached  low  values;  the  effect 
is  countered  by  making  it  possible  to  automatically  increase  it,  and  thus  the  probability  of 
escaping  a  local  minimum,  when  the  fitness  function  has  not  improved  over  a  given 
number  of  iterations.  This  automatic  increase  in  n  typically  resulted  in  noisy  output  that 
did  not  converge  to  the  optimum;  the  optimum  would  be  reached  at  some  intermediate 
point,  and  the  algorithm  would  oscillate  thereafter  until  the  cutoff  limit  of  M  iterations 
without  changes  in  n  was  reached.  In  contrast,  the  SSA  algorithm  took  longer  to  reach  its 
optimum,  but  always  converged  toward  it.  This  is  consistent  with  the  asymptotic 
convergence  proven  by  Aarts  and  Korst  (1990)  for  simulated  annealing  algorithms  using 
the  Metropolis  (fully  stochastic  vs.  the  partially  deterministic  S&S)  perturbation  method. 
The  parameterization  of  the  SSA  algorithm  was  also  simpler  than  with  the  S&S  algorithm 
(see  the  Appendix).  Both  the  S&S  and  SSA  algorithms  found  the  same  optimum  pattern 
for  the  SMSE  criterion.  However,  only  the  SSA  algorithm  arrived  at  the  optimal  pattern 
shown  for  the  SKV  criterion;  the  S&S  solutions  were  slightly  inferior. 

There  was  some  variability  (not  shown  in  Figure  4-3)  among  the  results  of  the  five 
instances  of  the  process  for  each  of  the  four  calibration  scenarios,  depending  on  the  initial 
conditions  of  the  process  and/or  the  sequence  of  random  numbers  involved.  This  suggests 
that  the  chosen  parameters  may  have  quenched  the  system  too  rapidly,  despite  the  fact 
that,  in  the  case  of  the  S&S  algorithm,  our  chosen  parameter  set  (see  the  Appendix)  was 
more  conservative  and  thus  should  converge  more  slowly  than  the  one  proposed  by  Sacks 
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and  Schiller  (1988).  This  leads  us  to  recommend  repeating  the  process  for  different  initial 
conditions  /  parameter  values,  especially  if  the  S&S  algorithm  is  being  used. 
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Figure  4-2.  Progress  of  the  five  instances  of  the  scenario  defined  by  the  scaled  mean 

squared  error  (SMSE)  fitness  function  and  the  Spatial  Simulated  Annealing 
(SSA)  algorithm.  The  missing  points  correspond  to  patterns  (10-location 
subsets)  that  were  penalized,  by  adding  a  large  number  to  their  fitness 
function,  for  not  predicting  the  water  content  of  all  the  remaining  47  locations 
in  the  microwatershed.  This  penalization  kept  the  algorithms  from  artificially 
reducing  the  value  of  their  fitness  function  by  minimizing  the  number  of  error- 
contributing  estimates.  The  algorithms  could  do  this  by  clumping  the  locations 
and  leaving  parts  of  the  microwatershed  beyond  the  maximum  kriging  search 
radius. 
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Calibration  dataset 
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Figure  4-3.  Results  of  the  calibration  (left)  and  validation  (right)  processes  for  each  of  the 
four  scenarios,  shown  as  scaled  kriging  variance  (SKV,  top)  and  scaled  mean 
squared  error  (SMSE,  bottom).  Left  panes:  results  of  the  density  reduction 
process  on  the  calibration  data  set.  Right  panes:  application  of  the  optimal 
calibration-phase  patterns  to  estimate  the  water  content  in  the  validation  set 
(using  the  scaled  semivariogram).  Since  calibration  used  3  sets  of 
measurements  and  validation  only  2,  the  absolute  values  of  the  errors  shown 
in  the  left  and  right  panes  should  not  be  compared  with  one  another.  Results 
for  three  regular  grids  and  132  random  patterns  (median  and  range)  were 
added  for  contrast. 

The  results  of  the  density  reduction  process  on  the  calibration  data  set  are  shown  on 
the  left  half  of  Figure  4-3.  Observe  how  the  values  of  the  scaled  kriging  variance  were 
quite  similar  across  the  optima  of  the  different  calibration  scenarios  as  well  as  the  regular 
grids  and  the  132  random  patterns.  This  is  in  great  measure  explained  by  two  factors: 
i)  we  set  a  300-meter  maximum  search  radius  in  the  GSLIB  kb2d  routine  (Deutsch  and 
Journel,  1992)  used  for  kriging,  and  ii)  In  the  case  of  the  randomly  generated  patterns,  we 
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did  not  consider  patterns  that  did  not  contain  estimates  for  all  of  the  47  points  belonging 
to  (D-S  ) .  The  kb2d  routine  will  not  predict  values  for  points  located  further  than  the 
maximum  search  radius  from  any  existing  data  point,  so  patterns  with  missing  estimates 
would  have  very  poorly  distributed  points  and  consequently  poor  (high)  SKV  values. 

During  the  calibration  process,  we  penalized  (by  adding  a  large  number  to  it)  the 
fitness  function  of  patterns  that  did  not  predict  all  of  the  47  points  belonging  to  (D-S) . 
This  was  necessary  in  order  to  keep  the  algorithms  from  reducing  the  value  of  their 
fitness  function  by  minimizing  the  number  of  estimates  that  contributed  to  it.  Thus,  very 
concentrated  arrangements  of  points  were  avoided,  and  kriging  variances  were 
correspondingly  low. 

As  shown  in  Figure  4-3,  SMSE  results  were  more  variable.  The  SMSE  of 
prediction  of  scenarios  in  which  SKV  was  optimized  were  over  50%  greater  than  for 
scenarios  where  SMSE  was  optimized.  The  three  regular  grids  produced  similar  results  to 
the  SKV  scenarios,  and  the  132  random  patterns  produced  highly  variable  results,  always 
with  at  least  39%  more  error  over  the  calibration  set  than  the  SMSE-calibrated  scenarios. 
These  results  will  be  discussed  below  together  with  those  of  the  validation  step. 
Validation 

The  right  half  of  Figure  4-3  shows  the  results  of  applying  the  optimal  patterns 
obtained  in  the  calibration  phase  to  estimate  water  content  in  the  validation  set  using  the 
scaled  semivariogram.  Note  that  the  left  half  of  Figure  4-3  was  created  using  data  from 
three  dates  and  the  right  half  was  made  using  only  two,  so  the  absolute  values  of  the 
errors  shown  in  each  half  should  not  be  compared  with  one  another. 
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Figure  4-4.  Maps  of  interpolated  water  content  for  both  validation  dates.  The  top  row 
shows  the  observed  data,  and  the  three  rows  below  it  show  the  predictions 
corresponding  to  the  best  SMSE  scenario,  the  best  SKV  scenario,  and  the  best 
regular  grid.  For  both  dates,  the  best  SMSE  scenario  reproduced  the  observed 
spatial  variability  more  accurately  than  the  others,  especially  the  wetter 
southeast  sector  of  the  field.  The  ten  locations  composing  each  scenario's 
pattern  are  marked. 
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As  expected,  the  observed  SKV  behavior  is  similar  to  that  of  the  calibration  set: 
lower  for  the  SKV-calibrated  scenarios  than  for  the  SMSE-calibrated  scenarios.  The  three 
regular  grids  had  somewhat  higher  values  than  the  SKV-calibrated  values,  and  the 
random  patterns  produced  very  variable  results,  with  SKV  values  even  lower  than  those 
produced  by  the  optimal  SKV-calibrated  patterns.  Twelve  of  the  random  patterns  had 
better  values  of  SKV  in  the  validation  set  than  the  optimum.  However,  they  tended  to 
have  high  values  of  SMSE  in  both  data  sets,  and  had  higher  values  of  SKV  in  the 
calibration  set  than  the  optimal  SKV-calibrated  pattern. 

Figure  4-4  shows  maps  of  interpolated  water  content  for  the  two  validation  dates. 
The  top  row  shows  the  observed  data,  and  the  three  rows  below  it  show  the  results  for  the 
best  SMSE  scenario,  the  best  SKV  scenario,  and  the  best  regular  grid.  For  both  dates,  the 
best  SMSE  scenario  reproduces  the  observed  spatial  variability  more  accurately  than  the 
others,  especially  the  wetter  southeast  sector  of  the  field  (due  to  the  simultaneous 
presence  of  points  5,  7  and  13  in  the  pattern). 

These  maps  point  out  the  inherent  limitations  of  kriging  from  a  very  small  subset  of 
data:  reproducing  spatial  water  content  variability  in  8  hectares  with  only  10  points  can 
capture  limited  detail.  However,  the  SMSE-calibrated  method  attained  a  relatively  low 
error  of  prediction.  Figure  4-5  shows  the  map  of  relative  error  of  prediction  er  of  the  best 
SMSE-calibrated  scenario  for  both  validation  dates,  and  Figure  4-6  shows  the  distribution 
at  each  validation  date  of  er  in  the  estimated  points  not  belonging  to  the  optimal  pattern 
for  both  the  optimal  SKV-calibrated,  and  the  optimal  SMSE-calibrated  scenarios.  Values 
of  er  were  mostly  low  across  both  dates:  for  the  best  SMSE  scenario,  on  Jan  25,  1993, 
50%  of  the  relative  errors  fell  within  ±5%,  82%  within  ±10%,  and  6.5%  fell  outside 


82 


±15%.  On  Dec  23, 1993,  35%  fell  within  ±5%,  72%  within  ±10%,  and  23%  fell  outside 
±15%.  There  was  better  prediction  accuracy  on  January  25  than  on  December  23,  and  the 
SMSE-calibrated  scenario  had  a  lower  fraction  of  large  errors  (defined  as  having 
|sr|  >  15%)  than  the  SKV-calibrated  on  both  dates. 

January  25,  1993  ^  December  23,  1993 
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Figure  4-5.  Map  of  relative  prediction  error  of  the  best  SMSE-calibrated  scenario  for  both 
validation  dates. 
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Figure  4-6.  Distribution,  for  each  validation  date,  of  relative  prediction  error  in  the 

estimated  locations  not  belonging  to  the  optimal  pattern  for  the  optimal  SKV- 
calibrated,  and  the  optimal  SMSE-calibrated  scenarios. 

The  method's  better  performance  on  January  25  may  be  related  to  its  mean  soil 

water  content  with  respect  to  December  23.  As  shown  in  Table  4-1,  both  dates  had  lower 

mean  soil  water  content  than  the  three  dates  used  for  calibration.  However,  the  December 

23  mean  field  water  content  is  especially  low,  142  mm  /  m  on  average.  This  suggests  that 
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the  predictive  capability  of  the  method  may  degrade  for  soil  moisture  scenarios  beyond 
the  method's  range  of  calibration.  However,  extraneous  factors  may  also  be  contributing 
to  the  error:  the  three  points  shown  on  Figure  4-6  with  sr  >  +15%  (large  over-prediction 
of  water  content)  in  the  SMSE-calibrated  scenario  on  December  23  correspond  to 
extremely  dry  points  (44,  48,  57)  near  the  field  border  that  had  water  contents  in  the  first 
meter  of  0V «  0.1 15,  the  permanent  wilting  point  for  a  Haplustoll  in  this  area.  This  water 
extraction  pattern  near  the  field  boundary  is  consistent  with  the  presence  of  weeds  during 
the  winter. 

Residual  Analysis  of  Validation  Results  and  Tests  of  Kriging  Assumptions 

Figures  4-7A  and  4-7B  show  scatter-plots  of  standardized  residuals  vs.  estimated 
soil  water  content  for  each  validation  date,  Figure  4-7C  shows  the  variance  for  both 
validation  dates  of  the  same  residuals  in  groups  of  4  values  set  up  for  a  test  of 
heteroscedasticity,  and  Figure  4-7D  shows  a  histogram  of  the  residuals  standardized  by 
their  kriging  standard  deviation  for  both  validation  dates.  Table  4-3  summarizes  the 
results  of  the  significance  tests  performed  on  these  data.  At  the  95%  confidence  level,  the 
kriging  standard  deviation-standardized  residuals  for  December  23  had  a  mean  that  was 
significantly  different  from  0,  and  a  variance  significantly  different  from  1 .  The 
standardized  residuals  on  January  25  had  a  statistically  significant,  trend  with  respect  to 
the  estimated  water  content.  The  former  implies  that  the  kriging  assumptions  were  not 
fully  respected  for  December  23,  and  the  latter  that  on  January  25,  the  proposed  model 
does  not  capture  all  the  phenomena  causing  spatial  variability  of  water  content.  This  is 
also  true  for  December  23,  given  its  comparatively  larger  fraction  of  high  errors  when 
compared  with  January  25. 
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Figure  4-7.  Residual  analysis  of  validation  results  and  tests  of  kriging  assumptions  for 
both  validation  dates.  Figures  7 A  and  7B  show  standardized  residuals  vs. 
estimated  soil  water  content.  Figure  7C  shows  the  variance  of  the  residuals  (in 
groups  of  4)  set  up  for  testing  heteroscedasticity.  Figure  7D  shows  a 
histogram  of  the  residuals  standardized  by  their  kriging  standard  deviation. 
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Table  4-3.  Results  of  residual  analysis. 
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Temporal  Stability  Analysis 

Vachaud  et  al.  (1985)  plotted  the  ranked  means  and  variances  of  5j  i ,  the  relative 
difference  between  the  observed  water  content  Gf  jat  each  location  i  at  time  j,  and  the 

0   —  q 

field- wide  mean  observed  water  content  6^5^=  iJ_   1  .  The  mean  of  this  value  over 

time  for  a  given  location  was  labeled  8,  (mean  relative  difference)  and  its  standard 
deviation,  ct(5,  j).  Figure  4-8  presents  the  measured  data  for  the  five  dates  pooled 
together  for  each  location,  with  the  8j  values  ranked  in  ascending  order  from  left  to  right. 
Note  how  certain  locations  systematically  either  overestimate  (s~,  -a(5j  j)  >  o)  or 
underestimate  (8j  +  a(5;  p  <  o)  the  microwatershed  average  soil  water  content,  irrespective 
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of  the  observation  date.  This  figure  also  puts  the  points  comprising  the  optimal 
SKV-based  subset  and  the  optimal  SMSE-based  subset  into  the  context  of  the  temporal 
stability  of  the  whole  microwatershed.  Note  how  the  SMSE-based  solution  captures  the 
full  range  of  variation  of  the  mean  relative  differences. 

Table  4-4  shows  the  results  of  the  Spearman  rank  correlation  test  among  the  five 
dates  taken  two  at  a  time.  Correlation  is  significant  through  all  combinations  of  two  dates, 
indicating  temporal  stability  of  the  soil  water  patterns.  Unsurprisingly,  correlation  is 
somewhat  higher  among  the  three  dates  corresponding  to  the  same  cropping  season. 


Table  4-4.  Spearman's  rank  correlation  tests  for  temporal  stability.  Each  cell  represents 
the  results  of  the  test  performed  between  the  data  of  the  dates  shown  for  the 
corresponding  row  and  column.  The  top  number  is  the  rank  correlation 
coefficient,  and  the  bottom  number  is  the  p-value.  Note  that  all  of  the  results 
are  significant,  indicating  temporal  stability.  The  average  total  soil  water 
content  (over  the  first  meter  of  soil)  at  each  date  is  shown  for  reference. 


Dates: 

Feb  7 

Feb  24 

Mar  20 

Jan  25 

Dec  23 

1992 

1992 

1992 

1993 

1993 

5J 

(mm) 

217.2 

228.0 

254.0 

195.3 

145.3 

Feb  7 
1992 

1 

0.692 

p<  0.001 

0.520 

p  <  0.001 

0.550 

p  <  0.001 

0.397 

p<  0.001 

Feb  24 
1992 

0.692 

p<  0.001 

1 

0.506 

p  <  0.001 

0.358 

p<  0.001 

0.436 

p<  0.001 

Mar  20 
1992 

0.520 

p<  0.001 

0.506 

p<  0.001 

1 

0.400 

p<0.001 

0.354 

p<  0.001 

Jan  25 
1993 

0.550 

p<  0.001 

0.358 

p<  0.001 

0.400 

p  <  0.001 

1 

0.356 

p<  0.001 

Dec  23 
1993 

0.397 

p<  0.001 

0.436 

p<  0.001 

0.354 

p<  0.001 

0.356 

p<  0.001 

1 

Sources  of  Error  and  Nonstationarity 

The  fact  that  the  SMSE-based  solution  appears  better  than  the  SKV-based  solution 
is  meaningful.  Formally,  given  enough  realizations  the  SKV-based  method  would  be 
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expected  to  also  minimize  the  SMSE.  This  is  the  rationale  behind  optimal  sampling 
network  construction  algorithms  that  minimize  kriging  variance  (van  Groenigen  et  al., 
1999;  McBratney  at  al.,  1981).  However,  as  pointed  out  by  Deutsch  and  Journel  (1992) 
SKV  is  not  necessarily  the  best  measure  of  local  accuracy  because  it  does  not  take  the 
actual  data  values  into  account.  Our  result  may  reflect  two  possibilities:  either  there  are 
not  enough  joint  realizations  of  the  random  variable  of  interest  and  the  error  of  the 
particular  available  realizations  deviated  from  its  expected  value  (given  by  the  kriging 
variance),  or  else  the  random  variable  is  spatially  non-stationary.  The  former  cannot  be 
ruled  out  and  is  perhaps  inevitable  in  a  study  with  a  limited  number  of  measurement 
dates.  The  latter  is  probable;  as  shown  above  there  was  a  violation  of  one  of  the  kriging 
assumptions  on  the  second  validation  date. 
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Locations,  ranked  by  mean  relative  difference 

Figure  4-8.  Ranked  intertemporal  relative  deviation  from  the  mean  (across  the 

microwatershed)  spatial  soil  water  content,  \  .  The  measured  data  for  the  5 
dates  are  pooled  together  for  each  location,  with  the  \  values  ranked  in 
ascending  order  from  left  to  right.  The  locations  comprising  the  optimal  SKV- 
based  subset  and  the  optimal  SMSE-based  subset  are  marked  with  arrows. 
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The  violation  of  spatial  stationarity  is  further  demonstrated  because  the  spatial 
pattern  of  soil  water  over  the  watershed  is  temporally  stable.  As  seen  in  Figure  4-8,  some 
points  such  as  #5  are  consistently  wetter  than  the  rest,  and  some  points  such  as  #41  are 
consistently  drier  than  the  rest.  The  SKV-based  scenarios  ignored  this,  and  produced  an 
optimal  subset  that  covered  a  limited  fraction  of  the  total  range  of  mean  relative 
difference.  Contrarily,  the  SMSE-based  scenario  covered  the  extremes,  capturing  a  range 
40%  greater  than  in  the  SKV  case.  It  incorporated  the  points  that  most  strongly  disobeyed 
the  nonstationarity  criterion  and  used  them  to  reduce  its  error  of  prediction. 

The  abovementioned  nonstationarity  is  a  result  of  different  processes  at  work 
throughout  the  micro  watershed.  Reynolds  (1970)  enumerated  several  static  and  dynamic 
factors  that  affect  spatial  variability  of  soil  water  content,  and  Mohanty  et  al.  (2000) 
equated  this  to  factors  governing  time  stability.  Kachanoski  &  De  Jong  (1988)  contended 
that  spatial  variability  of  hydrological  processes  degrades  time  stability  of  soil  moisture 
patterns  in  landscapes  with  topographic  redistribution  of  soil  water.  This  argument  was 
supported  by  results  from  Grayson  and  Western  (1998)  and  Mohanty  et  al.  (2000),  who 
worked  with  catchments  involving  topographically  routed  lateral  redistribution  of  soil 
moisture.  In  our  case,  the  rank  correlation  coefficient  values  of  the  tests  for  temporal 
stability  shown  in  Table  4-3  were  high  enough  to  indicate  temporal  stability,  but  it  does 
not  extend  to  the  entire  field:  several  points  have  highly  variable  values  of  5( ,  resulting 
from  a  variety  of  processes.  Point  57,  for  example,  is  on  a  header  row  and  has  a  highly 
variable  soybean  stand  quality  as  a  result  of  the  maneuvering  of  farm  equipment;  point  33 
can  form  part  of  a  waterway  during  intense  rainstorms,  etc. 
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Topography  is  influential  in  the  microwatershed  under  study:  elevation  decreases 
eastward,  and  there  is  also  a  small  embankment  on  its  eastern  edge.  Contributing  area 
increases,  and  slope  decreases  eastward  consistently  with  the  observed  tendency  for 
ponding  near  the  eastern  edge  of  the  microwatershed.  It  is  therefore  not  surprising  that 
the  southeastern  sector  of  the  field  has  a  highly  variable  water  content,  and  that 
consequently  the  SMSE-based  algorithm  concentrated  several  points  (#5,  #7,  #13)  there 
to  maximize  its  predictive  accuracy. 

Although  our  results  are  encouraging,  purely  statistical  models  may  be  unable  to 
fully  capture  soil  water  variability,  especially  following  rainfall  events  that  generate 
significant  surface  and  /  or  subsurface  flow.  Indeed,  Kachanoski  and  De  Jong  (1988) 
observed  that  soil  drying  did  not  alter  the  spatial  pattern  of  soil  water  content,  but  time 
stability  during  recharge  was  scale  dependent,  correlating  with  surface  curvature  at 
spatial  scales  below  a  certain  distance  threshold  (40  meters).  This  altered  the  spatial 
pattern  of  soil  water  content  during  recharge  at  large  (fine)  scales  but  not  at  small 
(coarse)  scales. 

A  convenient  method  of  expressing  topographical  influence  on  soil  water 
distribution  is  by  using  topographic  indices  such  as  the  one  proposed  by  Beven  and 
Kirkby  (1979).  Their  index  represents  the  effects  of  variable  contributing  upslope  areas 
and  the  slope  at  the  point  of  interest.  Nyberg  (1996)  found  strong  correlation  between  this 
topographic  index  and  soil  water  content.  Crave  and  Gascuel-Odoux  (1997)  successfully 
linked  water  content  with  a  topographic  index  referring  to  downslope  conditions,  defined 
as  the  elevation  difference  between  the  point  of  interest  and  the  outlet  of  the  water 
pathway.  In  contrast,  Ladson  and  Moore  (1992)  concluded  that  temporally  varying  soil 
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water  content  was  not  well  predicted  by  simple  static  topographic  attributes  over  a  gently 
sloping  catchment  in  the  Konza  Prairie,  Kansas. 

Subtracting  a  trend  from  the  data  and  kriging  only  the  residual  component  can  be 
an  effective  means  of  minimizing  the  effect  of  nonstationarity  (Goovaerts,  1997). 
However,  in  this  case  the  trend  itself  is  temporally  and  spatially  variable  given  its 
dependence  on  hydrological  processes.  A  simple  physically  based  tool  such  as  the  Beven 
&  Kirkby  model  (Beven  et  al,  1995;  Beven  and  Kirkby,  1979)  extended  with  a  crop 
simulation  model  such  as  CROPGRO  (Boote  et  al.,  1998),  SUCROS  (van  Laar  et  al., 
1992)  or  GLYCIM  (Acock  and  Trent,  1991)  to  account  for  the  effects  of  crop  cover,  may 
be  a  valuable  tool  for  generating  temporally  variable  trend  surfaces. 

Conclusions 

We  combined  the  scaled  semivariogram  technique  with  two  simulated  annealing 
algorithms  to  reduce  the  number  of  locations  necessary  to  describe  water  content  in  our 
8-hectare  study  area  from  57  down  to  10  points.  The  scaled  semivariogram  allowed  us  to 
incorporate  data  from  several  dates,  both  to  reflect  time-independent  behavior  of  water 
content  and  to  compensate  for  the  relatively  small  size  of  the  individual  datasets. 

Of  the  two  simulated  annealing  algorithms,  Spatial  Simulated  Annealing  (van 
Groenigen  and  Stein,  1998)  produced  more  consistent  results  than  the  Sacks  and  Schiller 
method  (Sacks  and  Schiller,  1988),  although  the  solutions  provided  by  both  algorithms 
were  quite  similar.  Running  multiple  instances  of  the  optimization  process  is 
recommended,  especially  if  using  the  Sacks  &  Schiller  method. 

Our  proposed  method  predicted  water  content  across  the  validation  set  with 
relatively  low  errors:  over  70%  of  all  the  predicted  water  contents  had  an  error  within 
±10%,  acceptable  for  the  application  it  was  designed  for.  The  method  also  captured  the 
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spatial  variability  of  water  better  than  regular  grids  or  randomly  generated  patterns. 
However,  the  SMSE  (scaled  mean  squared  error)  -  based  scenarios  performed  better  than 
the  scenarios  using  an  SKV  (kriging  variance)  criterion. 

We  detected  temporal  stability  in  the  dataset.  This  phenomenon  implies  the 
existence  of  spatial  nonstationarity  of  the  water  content  across  the  field,  leading  to  the 
violation  of  kriging  assumptions  and  the  degradation  of  the  quality  of  kriging  estimates. 
However,  the  SMSE-based  optimization  scenarios  incorporated  temporally  stable 
extreme  (wet  and  dry)  points  into  the  optimal  subset,  using  them  to  capture  the 
nonstationary  behavior. 

Our  method  may  be  improved  by  combining  a  spatial  water  movement  model  with 
a  crop  simulation  model  to  provide  a  temporally  variable  trend  that  can  be  used  to 
eliminate  possible  spatial  nonstationarity. 


CHAPTER  5 

A  FASTER  ALGORITHM  FOR  CROP  MODEL  PARAMETERIZATION  BY 
INVERSE  MODELING:  SIMULATED  ANNEALING  WITH  DATA  REUSE 

Introduction 

Crop  simulation  models  have  been  proposed  as  valuable  tools  for  understanding  the 
causes  of  spatiotemporal  yield  variability  and  developing  optimized  forms  of 
management  (Batchelor  et  al.,  2002).  A  major  challenge  for  such  endeavors  is  the 
determination  of  the  necessary  soil  parameters:  in  practical  precision  agriculture 
applications  these  inputs  cannot  be  measured  due  to  cost  constraints,  and  must 
consequently  be  estimated. 

Welch  et  al.  (1999a)  proposed  an  inverse-modeling-based  method  for  estimating 
crop  model  genetic  coefficients.  The  method  exhaustively  simulated  all  the  parameter 
combinations  in  a  discrete  input  space,  and  then  examined  the  results  to  find  the  best  set 
of  parameters  for  each  crop  variety.  The  "best"  parameter  combination  was  defined  as  the 
one  producing  the  minimum  value  of  an  objective  function  (OF)  defined  as  the  sum  of 
squared  residuals  between  simulated  and  observed  data. 

Irmak  et  al.  (2001)  expanded  on  the  grid  search  concept,  using  it  to  estimate  soil 
properties.  They  compared  grid  search  results  with  those  of  adaptive  simulated  annealing 
(Ingber,  1993),  a  sophisticated  search  method  which  had  been  used  previously  by  other 
authors  to  parameterize  crop  models  (Braga,  2000;  Calmon  et  al.,  1999;  Paz  et  al  2001). 
Irmak  et  al.  (2001)  presented  an  example  problem  from  a  field  in  Iowa  involving  five 
parameters  and  two  years  of  observed  crop  yield  data,  solving  it  as  follows: 
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•  Discretize  the  input  space  into  74,536  combinations,  describing  the  range  of 
variation  of  one  parameter  with  seven  points,  another  parameter  with  eight,  and  the 
remaining  three  with  1 1  points  each. 

•  Run  the  CROPGRO  model  (Boote  et  al.,  1 998)  for  each  parameter  combination  and 
two  weather  years. 

•  Apply  landscape-position-dependent  rules  to  reduce  the  number  of  parameters  to 
estimate,  based  on  expected  parameter  sensitivity.  For  example,  in  the  second  case 
study  presented,  three  parameters  were  estimated  for  a  soil  described  as  not  having 
drainage  problems:  the  SCS  CN2  curve  number  (USDA,  1972),  SLPF  (a  soil 
fertility  factor)  and  SLB  (maximum  rooting  depth).  If,  on  the  other  hand,  the  soil 
had  drainage  problems  (but  did  not  have  tile  drainage),  only  SLPF  and  KSAT 
(saturated  hydraulic  conductivity)  would  be  estimated,  assuming  that  the  profile's 
capacity  to  get  rid  of  excess  water  as  expressed  by  KSAT,  would  be  a  more 
sensitive  parameter  than  a  runoff  parameter  or  the  rooting  depth. 

•  The  rule-compatible  parameter  combination  having  the  lowest  root-mean-squared 
error  (RMSE)  between  CROPGRO-predicted  and  observed  yield  across  the  two 
years  was  chosen  as  the  optimum. 

•  Run  the  process  independently  for  each  of  the  1 1  distinct  environments  (soil  types) 
in  the  field. 

Irmak  et  al.  (2001)  reported  that  runtime  for  the  grid  search  implementation 
described  above  was  less  than  half  that  of  adaptive  simulated  annealing.  Although  the 
grid  search  method  requires  simulating  the  whole  input  space  in  order  to  provide  a 
parameter  estimate,  and  can  thus  be  considered  very  inefficient,  these  results  can  be 
understood  in  the  precision  agriculture  context  in  which  parameters  are  sought  for 
multiple  locations.  Consider  the  hypothetical  field  shown  in  Figure  5-1,  divided  into  a 
series  of  distinct  environments  (for  example,  soil  map  units)  Ei,  E2,  and  E3.  Each 
environment  i  contains  one  or  more  locations  of  interest  Ljj  where  model 
parameterization  is  desired.  Locations  within  a  distinct  environment  are  considered  to 
have  similar  but  not  identical  soil  properties,  so  the  same  discretized  parameter  space  is 
used  for  all  of  them.  A  complete  set  of  crop  model  runs  is  performed  only  once  per 


94 

environment,  and  a  rapid  search  within  the  results  yields  parameter  estimates  for  all  the 
locations  of  interest  within  each  environment. 
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Figure  5-1.  A  hypothetical  field  divided  into  three  environments  (soil  types)  E1;  E2,  E3. 

Each  environment  i  contains  a  number  of  locations  Ljj  of  interest.  The  grid 
search  algorithm  runs  the  model  to  simulate  the  whole  parameter  space  for 
each  environment  i,  and  the  corresponding  set  of  answers  is  searched  for  the 
minimum-RMSE  parameter  combination  for  each  location  of  interest  Ljj. 

In  contrast,  the  adaptive  simulated  annealing  algorithm  used  by  Irmak  et  al.  (2001) 
was  run  independently  for  each  location  within  an  environment.  The  parameter 
estimation  process  for  any  given  location  could  not  reuse  the  OF  values  calculated 
previously  for  other  locations.  This  could  result  in  multiple  (time-costly)  crop  model  runs 
for  the  same  parameter  combinations  at  different  locations,  inefficient  given  that  soil 
parameters  are  usually  expected  to  be  similar  across  locations  within  a  soil  type. 
Furthermore,  the  simulated  annealing  algorithm  was  run  on  a  continuous  parameter 
space,  and  so  estimated  parameters  with  a  different  (lower)  level  of  uncertainty  than  the 
grid  search.  In  light  of  these  implementation  differences,  a  direct  performance 
comparison  of  the  two  algorithms  is  not  appropriate.  However,  there  is  opportunity  for 
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further  speed  improvement  by  combining  the  two  methods,  incorporating  data  reuse  and 
a  discrete  parameter  space  into  a  simulated  annealing  algorithm. 

Considering  the  previously  mentioned  assumption  of  similarity  among  the  soil 
parameters  of  nearby  locations  in  a  mapping  unit,  our  working  hypotheses  were  that: 

•  The  runtime  of  a  simulated  annealing  algorithm  used  for  crop  model 
parameterization  would  be  greatly  reduced  if  it  were  allowed  to  re-utilize 
simulation  results  across  locations  within  an  environment,  and 

•  A  simulated  annealing  algorithm  thus  modified  could  have  similar  accuracy  and  a 
significantly  lower  runtime  than  the  grid  search  algorithm  for  a  wide  range  of 
number  of  locations  per  environment. 

The  objective  of  this  study  was  to  implement  and  test  (relative  to  a  pure  grid 
search)  a  simulated  annealing  algorithm  modified  to  work  on  a  discrete  parameter  space 
and  reuse  previously  simulated  results. 

Materials  and  Methods 
Simulated  Annealing  Overview 

Simulated  annealing  (SA)  is  a  combinatorial  optimization  algorithm  derived  from 
the  work  of  Metropolis  et  al.  (1953).  Our  study  implemented  a  slightly  modified  version 
of  the  algorithm  used  by  Ferreyra  et  al.  (2000),  which  in  turn  was  derived  from  the  spatial 
simulated  annealing  algorithm  of  van  Groenigen  and  Stein  (1998).  Simulated  annealing 
algorithms  have  four  main  components:  the  fitness  function,  acceptance  criterion, 
generation  mechanism,  and  cooling  schedule  (Aarts  and  Korst,  1990).  They  are  all  briefly 
described  below. 

Fitness  function:  the  objective  function  (OF)  to  be  minimized  or  maximized. 
Acceptance  criterion:  Let  xJand  x'  be  two  vectors  located  in  the  parameter  space  such 
that  x'  is  generated  by  perturbing  xJ .  Let  the  corresponding  fitness  function  values  be 
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J(xj)  and  J(x')  respectively.  The  acceptance  criterion  determines  whether  x'  replaces 
xJ  or  not.  In  a  minimization  problem,  the  acceptance  probability  is  defined  as: 


1 

exp 


0(xJ)-J(x')A 


if  J(x')<J(xJ) 
if  J(x')>J(xJ) 


(5-1) 


where  c  is  a  positive  control  parameter  (or  function)  that  decreases  as  the  algorithm 
progresses. 

Generation  mechanism:  The  pattern  x'  is  generated  from  xj  by  adding  a  vector  u  to 
xJ ,  where  u  is  a  randomly  generated  vector  such  that  x'=  xJ  +  u  corresponds  to  a  valid 
point  in  the  discrete  parameter  space,  and  the  length  of  the  projection  of  u  along  the  kth 
coordinate  axis  does  not  exceed  h  times  the  total  data  range  along  that  axis.  Parameter  h 
typically  takes  an  initial  value  of  1 ,  and  may  decrease  with  time. 
Cooling  schedule:  The  cooling  schedule  changes  the  value  of  c  and  h  as  the  algorithm 
progresses.  A  number  of  iterations  of  the  algorithm  are  performed  at  each  level  of  c  and 
h,  after  which  a  transition  occurs  and  the  parameters  are  updated: 


i+i 


c  =  ac-c 
h1+'    =   a,  -h' 


(5-2) 


with  ac  and  ah  having  values  slightly  less  than  1 . 

The  SA  algorithm  stops  when  c  becomes  less  than  a  pre-established  final  value  cf. 
The  total  number  of  c  and  h  transitions  is  derived  from  Equation  5-2,  according  to: 


n  =  round 


l0§a 


'  c  1 


(5-3) 


The  total  number  of  iterations  N  is  linked  to  n  by  the  expression 
N  =  (n  +  l)m  (5-4) 
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where  m  is  the  number  of  iterations  performed  at  each  value  of  c.  It  is  desirable  that  m  be 
a  large  number.  This  gives  the  system  the  opportunity  to  reach  equilibrium  at  each  level 
of  c,  which  maximizes  the  probability  of  reaching  an  optimal  solution  (Ingber,  1993). 
Aarts  and  Korst  (1990)  provided  a  probabilistic  mechanism  for  calculating  an  optimal 
value  of  m,  but  in  this  study  m  was  used  as  an  independent  variable  because  of  the 
emphasis  placed  on  speed  and  control. 
Crop  Model  Management 

Successive  iterations  of  the  SA  process  began  with  a  parameter  set  provided  by  the 
generating  mechanism.  Since  the  proposed  parameter  values  could  have  been  presented 
to  the  crop  model  in  previous  iterations,  a  simple  data  structure  for  recording  model  runs 
was  devised:  a  multidimensional  array  containing  a  Boolean  variable  (or  "flag")  was 
created  for  each  possible  parameter  combination.  A  0  value  of  this  flag  indicated  that  the 
corresponding  parameter  combination  had  not  been  previously  explored;  the  crop  model 
was  run  for  that  combination,  its  simulated  yield  values  were  stored  in  multidimensional 
arrays  of  real  numbers,  and  the  flag  was  set  to  1 .  On  the  contrary,  if  the  flag  value  was  1, 
the  parameter  combination  had  been  previously  simulated,  and  the  yield  data  were 
retrieved  from  the  arrays  instead  of  being  simulated  by  the  crop  model. 
Case  Studies 

The  performance  of  the  grid  search  and  simulated  annealing  methods  were 
compared  in  two  case  studies: 

The  first  is  a  synthetic  one-environment,  one-location,  two-parameter  case  built  on 
a  201x201  parameter  space.  The  purpose  of  this  study  was  to  explore  how  different  SA 
parameter  values  and  the  size  of  the  parameter  space  change  the  solution  to  a  complex 
problem  having  numerous  local  optima.  The  effect  of  parameter  space  size  was  explored 
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by  sub-sampling  the  original  201x201  grid  to  obtain  smaller  (e.g.  101x101,  51x51) 
spaces.  The  problem  used  the  OF  shown  in  Figure  5-2,  built  with  a  Bessel  function  as 
follows: 

z  =  J0(Vx2+y2  -14.93057)  (5-5) 

The  x  and  y  ranges  (-1 .75  <  x  <  0.25;  -0.25  <  y  <  1 .75)  were  selected  so  the 
maximum  OF  value  was  located  away  from  the  center  of  the  parameter  space.  The  intent 
of  this  displacement  was  to  generate  a  less  favorable  situation  when  ah  <  1 .  In  such  cases, 
all  of  the  parameter  space  locations  cannot  be  reached  from  all  the  other  locations,  and 
the  center  of  the  domain  tends  to  be  visited  more  often  than  the  periphery.  Positioning  the 
optimum  away  from  the  center  of  the  domain  makes  the  problem  more  difficult  to  solve. 


Figure  5-2.  Objective  function  used  in  case  study  1. 
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The  second  case  study  consisted  of  a  one-environment,  13 -location,  four-parameter, 
two-year  problem  with  78,608  (173xl6)  parameter  combinations  of  the  maize  {Zea  mays 
L.)  model  CERES-Maize  (Ritchie  et  al.,  1998).  The  purpose  of  this  case  study  was  to  test 
the  SA  algorithm  with  a  crop  model  and  real  data,  and  explore  how  its  performance 
varied  with  a  growing  number  of  locations  per  environment.  This  was  a  minimization 
problem;  the  OF  was  the  RMSE  between  predicted  and  observed  yield  of  the  1999  and 
2001  corn  harvests  in  the  Suggs  4  field,  located  near  Murray,  KY,  USA  (36°  32'  N,  88° 
27'  W,  elev.  222  m).  Relative  errors  were  used  in  order  to  keep  the  OF  scaled 
approximately  between  0  and  1  and  to  use  the  same  values  of  c°  used  in  case  study  1. 
Otherwise,  the  behavior  of  the  acceptance  criterion  shown  in  eq.  (1)  would  change 
significantly  unless  c°  were  altered  to  accommodate  the  different  scale.  The  crop  model 
parameters  and  their  limits  are  shown  in  Table  5-1. 

Table  5-1 .  Crop  model  parameters  and  ranges  for  case  study  2.  

Parameter  Definition  Units       Minimum  Maximum  N°  Points 

KSAT      Saturated  hydraulic  cm  d"1       0.0001      67l  16 

conductivity,  bottom  soil  layer 
CN2        SCS  runoff  curve  number  72  92  17 

SDEP      Soil  depth  cm  45  165  17 

PDEN      Plant  density  Plants  m'2  3  8  17 


In  both  case  studies,  the  parameterization  process  was  run  for  multiple  scenarios 
defined  by  all  the  possible  combinations  of  the  SA  parameter  values  shown  in  Table  5-2. 
For  example,  case  study  1  had  6  x  7  x  1  x  1  x  3  x  7  x  5  =  4,410  scenarios.  Each  scenario 
was  run  seven  times  with  different  initial  conditions. 

It  is  important  to  clarify  the  difference  between  a)  the  crop  model  (or  objective 
function)  parameters,  i.e.  the  value  combinations  that  form  the  parameter  space  for  the 
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optimization  algorithm,  and  b)  the  SA  parameters  shown  in  Table  5-2,  which  control  the 
duration  and  convergence  properties  of  the  SA  algorithm. 


Table  5-2:  Simulated  annealing  scenario  parameters. 


SA  Parameter  Values  

ae  0.995,  0.99,  0.95,  0.9,  0.85,  0.8 

cch  1,  0.995,  0.99,  0.95,  0.9,  0.85,  0.8 

c°  1 

h°  1 

cf  0.01,0.001  0.0001 

M  1,2,5,10,20,50,100 

Space  size  (case  study  1)  2012,  1012,  672,  512,  252,  192 

Locations  (case  study  2)  1-13  


Results  and  Discussion 

Case  Study  1 

The  results  of  this  case  study  show  how  SA  can  converge  to  a  local  minimum  if  not 
allowed  a  sufficient  number  of  iterations,  especially  if  ah  <  1 .  Figure  5-3  shows  the 
results  of  six  runs:  three  for  ah  =  1  and  three  for  ah  =  0.995.  In  the  former,  all  parameter 
combinations  were  always  reachable  from  all  the  others,  whereas  in  the  latter,  parameter 
changes  were  bounded  by  a  progressively  smaller  radius  (Equation.  5-2).  Note  how,  after 
roughly  2000  iterations  spent  mostly  at  low  OF  values,  the  cch  =  1  runs  all  reached  the 
close  vicinity  of  the  global  optimum  (where  the  OF  =  1).  Conversely,  the  three  ah  =  0.995 
scenarios  frequently  converged  to  local  optima.  The  best  convergence  to  the  global 
optimum  in  the  201x201  space  occurred  with  ccc=  0.995,  ah=  1,  cf=  0.0001,  and  m  >  5, 
which  produced  approximately  a  quarter  of  the  number  of  OF  calculations  required  by  a 
grid  search. 
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Using  ah  <  1  was  successful  for  solving  other  combinatorial  optimization  problems 
(van  Groenigen  and  Stein  1998,  Ferreyra  2002).  Its  poor  performance  in  this  study  may 
be  due  to  the  limits  imposed  on  the  total  number  of  iterations  as  determined  by  m  and  ccc. 
Attaining  equilibrium  at  each  level  of  c  is  important  in  SA  yet  is  not  possible  when  m  is 
too  small  (Aarts  and  Korst,  1990).  The  och  =  1  scenarios  did  well  because  the  region 
around  the  global  optimum  was  as  probable  as  any  other.  On  the  contrary,  when  ah  <  1 
and  m  =  5  the  region  was  not  visited  often  enough  to  converge  to  the  global  optimum. 
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Figure  5-3.  Objective  function  vs.  number  of  algorithm  iterations  for  six  runs  of  the 
simulated  annealing  algorithm.  The  left  column  corresponds  to  ah  =  1,  the 
right  to  ah  =  0.995.  In  all  cases  ac  =  0.995,  cf  =  0.0001,  and  m  =  5. 


Minimizing  the  fraction  of  N  (total  iterations)  corresponding  to  unique  objective 
function  calls  (or  model  runs),  i.e.  maximizing  the  number  of  database  hits,  is  very 
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desirable.  Complex  crop  models  such  as  CROPGRO  tend  to  run  slowly  because  they 
simulate  many  complex  physical  processes.  For  example,  CROPGRO's  ETPHOT  gas 
exchange  routine  (Boote  and  Pickering,  1994)  requires  the  iterative  solution  of  a  system 
of  linear  equations  to  simultaneously  determine  transpiration,  leaf-scale  energy  balance, 
and  photosynthesis.  A  proposed  revised  soil  temperature  simulation  routine  (Andales  et 
al.,  2000)  has  similar  requirements.  In  contrast,  the  parameterization's  database 
management  overhead  only  requires  integer  arithmetic  and  very  fast  indexing  operations 
in  data  structures  contained  wholly  in  the  computer's  memory.  The  SA  process  does 
require  real  arithmetic  for  the  calculation  of  the  OF,  the  acceptance  criterion,  and  the 
cooling  schedule.  However,  one  CROPGRO  run  may  use  as  much  time  as  the  whole  SA 
process  plus  thousands  of  database  hits.  Thus,  this  study  only  compared  runtimes  in  terms 
of  model  runs. 

Figure  5-4  shows  the  number  of  unique  model  runs  required  by  the  SA  algorithm 
for  three  different  parameter  space  sizes  and  three  different  values  of  m.  The  box  plots  on 
the  left  column  correspond  to  ah  =  1,  the  ones  on  the  right  to  ah  =  0.995.  Each  row  of 
plots  shows  a  different  m;  the  bottom  row  (m  =  100)  represents  20  times  more  algorithm 
iterations  than  the  top  row  (m  =  5).  The  three  box  plots  per  graph  show  different 
parameter  space  sizes  (the  leftmost  is  the  original  201x201  grid).  All  the  scenarios  have 
ac  =  0.995  and  cf=  0.0001. 

Each  of  the  rows  (showing  six  scenarios  and  seven  repetitions  per  scenario) 
corresponded  to  the  same  number  of  iterations  per  row  (9190,  36760,  and  183800  for 
m  =  5,  20  and  100,  respectively),  but  the  number  of  actual  OF  calculations  (equivalent  to 
crop  model  runs)  varied  greatly  depending  on  the  parameter  space  size  and  the  value  of 
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ah.  The  dependence  on  parameter  space  size  is  explained  by  the  number  of  available 
parameter  combinations,  much  lower  for  a  51x51  space  than  for  a  201x201  space,  and  by 
the  probability  of  visiting  a  previously  used  cell  (and  retrieving  data  from  the  database 
instead  of  running  the  model),  which  is  higher  in  the  smaller  space.  The  differences 
between  results  for  different  ah  values  have  similar  causes:  the  total  number  of  model 
runs  was  higher  for  ah  =  1  because  every  cell  in  the  parameter  space  was  always  a  valid 
destination,  so  the  probability  of  revisiting  a  cell  was  relatively  low.  Conversely,  when 
och  <  1,  the  set  of  valid  destinations  shrank  as  h  decreased,  and  the  probability  of  a 
database  hit  (revisiting  a  previously  simulated  cell)  increased. 
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Figure  5-4.  Unique  objective  function  calculations  (equivalent  to  crop  model  runs)  for  18 
scenarios  (7  repetitions  /  scenario)  in  case  study  1 .  The  dot  shows  the  median 
value  for  the  7  repetitions;  the  whiskers  show  the  extremes. 
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Figure  5-4  shows  how  the  simulated  annealing  approach  as  used  by  Braga  (2000), 
Paz  et  al.  (2001),  and  Irmak  et  al.  (2001)  can  run  slower  than  the  grid  search  even  for  one 
location:  if  a  high  value  of  m  is  adopted  (for  example,  m  =  100,  corresponding  to  183,800 
iterations)  to  assure  convergence  to  the  global  optimum,  the  number  of  iterations  (which 
corresponded  to  crop  model  runs  in  the  abovementioned  studies)  greatly  exceeds  the  size 
of  the  parameter  space  (at  most  201x201  =  40,401). 
Case  Study  2 

Case  study  1  provided  the  SA  algorithm  with  nearly  worst-case  conditions;  its  well- 
behaved  scenarios  (ac=  0.995,  ah=  1.000,  cf=  0.0001,  and  m  >  5)  were  expected  to 
behave  well  in  case  study  2,  even  though  the  number  of  iterations  allowed  would  be  a 
smaller  fraction  of  the  parameter  space  size. 
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Figure  5-5.  Total  model  runs  vs.  number  of  locations  per  environment  for  case  study  2. 
Both  simulated  annealing  scenarios  had  c'  =  0.0001  and  m  =  5. 


Figure  5-5  shows  how  the  number  of  unique  model  runs  grew  with  the  number  of 
locations  of  interest  within  the  environment.  The  top  line  corresponds  to  the  grid  search, 
in  which  CERES  was  run  for  the  whole  parameter  space  irrespective  of  the  number  of 
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locations  of  interest  within  the  environment.  The  other  curves  correspond  to  the  SA 
algorithm  and  different  cch  values.  Both  SA  scenarios  tended  asymptotically  towards  the 
parameter  space  size,  but  the  c*h  =  0.995  scenario  did  so  more  slowly  that  the  ah  =  1 
scenario.  This  occurred  because  the  algorithm  converged  faster  with  decreasing  values  of 
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Figure  5-6.  Error  at  each  location  of  interest  for  the  grid  search  and  two  simulated 

annealing  scenarios  of  case  study  2.  The  simulated  annealing  scenarios  have 
ac  =  0.995,  cf=  0.0001,  and  m  =  5. 


As  in  case  study  1,  the  ah  =  0.995  scenario  produced  the  greater  error.  Figure  5-6 
shows  the  median  and  extreme  final  OF  values  for  seven  runs  in  each  of  13  locations  in 
the  Suggs  4  field.  Note  how  the  cch  =  0.995  scenario  had  consistently  higher  and  more 
variable  OF  values  than  the  cch  =  1  scenario,  which  in  turn  had  very  similar  values  to  the 
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grid  search  case.  The  minor  differences  occurring  between  the  latter  two  methods  in  some 
locations  can  generally  be  fixed  at  the  end  of  the  SA  algorithm  using  a  gradient-descent- 
like process  consisting  of  a  few  dozen  extra  iterations  with  h  «  1  and  c  =  0.  This  ensures 
that  the  algorithm  converges  to  the  maximum  OF  value  in  the  vicinity  of  the  current 
location.  Using  this  final  step  in  conjunction  with  a  low  value  of  m  or  ah  does  not 
compensate  for  the  lack  of  iterations,  however,  because  there  is  a  high  probability  that  the 
algorithm  would  converge  to  a  local  optimum.  Despite  the  observation  by  Welch  (1999b) 
that  crop  models  tend  to  have  a  smooth  response  to  parameter  variation,  Royce  et  al. 
(2001)  showed  that  local  optima  may  occur  frequently. 

To  add  perspective  to  the  results  of  Figure  5-5  for  ah  =  1,  the  case  studies  shown  by 
Irmak  et  al.  (2001)  had  9  to  48  total  locations  distributed  among  up  to  1 1  soil  types,  and 
Paz  et  al.  (1998)  used  100  locations  divided  among  seven  soil  types.  Using  the  proposed 
algorithm  in  precision  agriculture  projects  such  as  these  would  result  in  runtime  savings 
of  25%  -  75%.  Additionally,  landscape  position  dependent  rules  can  be  easily  added  to 
the  SA  generation  mechanism  to  exclude  invalid  parameter  combinations  and  reduce  the 
effective  parameter  space  size  as  suggested  by  Irmak  et  al.  (2001). 

Unlike  a  grid  search,  the  SA  algorithm  can  be  used  to  explore  very  large  parameter 
spaces.  Most  crop  modeling  applications  in  precision  agriculture  studies  to  date  have 
assumed  that  there  is  no  coupling  among  the  processes  occurring  in  different  landscape 
positions,  and  that  consequently  all  the  locations  in  the  landscape  can  be  simulated 
independently.  In  a  situation  with  coupling  due  to  spatial  water  movement,  for  example, 
the  parameters  at  one  location  can  affect  the  simulation  results  (and  consequently,  the 
parameter  estimates)  at  another.  The  database  method  is  inapplicable  in  this  circumstance 
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because  the  optimization  problem  size  increases  exponentially  with  the  number  of 
locations  of  interest.  An  approach  using  SA  can  still  produce  a  good  approximate  answer, 
however.  As  an  example  of  a  large  combinatorial  problem  solved  with  SA,  the  soil  water 
monitoring  network  optimization  problem  solved  by  Ferreyra  et  al.  (2002)  had 
4.318  x  1010  parameter  combinations  and  consistently  converged  to  the  same  good  * 
solution. 

Finally,  the  proposed  algorithm  can  help  introduce  crop-modeling-based  tools  into 
practical  industry  applications.  Figure  5-3  shows  how  the  SA  algorithm  can  generate 
good  "first  cut"  solutions  in  only  a  few  hundred  iterations,  making  it  possible  to  answer 
"what-if '  management  optimization  questions  in  a  matter  of  seconds  rather  than  hours. 

Conclusions 

This  study  shows  that  the  runtime  of  a  simulated-annealing-based  crop  model 
parameterization  process  is  greatly  reduced  through  the  reuse  of  simulation  results  across 
successive  iterations  of  the  SA  algorithm  and  across  locations  within  an  environment. 

The  performance  of  the  modified  simulated  annealing  algorithm  used  was 
parameter  value  dependent.  However,  a  conservative  parameter  combination  was  found 
(ac=  0.995,  ah=  1.000,  cf  =  0.0001,  and  m  >  5)  that  ran  much  faster  than  a  grid  search, 
its  runtime  tending  asymptotically  to  that  of  the  grid  search  as  the  number  of  locations  of 
interest  grew,  while  converging  to  objective  function  values  (and  the  corresponding 
parameter  combinations)  practically  identical  to  the  global  optima  determined  using  the 
grid  search  method. 

Adoption  of  the  proposed  algorithm  can  produce  runtime  reductions  on  the  order  of 
25%  -  75%,  depending  on  the  geometry  of  the  simulation  domain.  Additionally,  it  can  be 
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used  to  parameterize  coupled  spatial  crop  models  in  which  parameter  values  at  one 
location  can  affect  parameter  values  at  other  locations,  a  task  not  possible  using  a  grid 
search.  Finally,  the  SA  algorithm  can  very  quickly  produce  approximate  answers  useful 
in  practical  applications. 


CHAPTER  6 

USING  BAYESIAN  NETWORKS  TO  HELP  UNDERSTAND  CAUSAL 

RELATIONSHIPS 

Introduction 

Agriculture  is  a  complex  endeavor:  inflation-adjusted  commodity  prices  decreased 
steadily  during  the  20th  century  (USDA  NASS,  1994),  forcing  farmers  to  attain 
progressively  better  crop  yields  in  order  to  survive.  Concurrently,  environmental 
regulations,  labor  constraints,  and  interannual  climate  variability  also  drive  farmers 
toward  risk  management  and  the  optimization  of  management  decisions  such  as  planting 
dates,  seed  treatments,  fertilization  rates,  variety  selection,  etc. 

Understanding  the  causes  of  yield  variability  and  crop  response  to  management  is 
necessary  before  management  decisions  can  be  optimized.  Agricultural  systems  are  very 
complex,  in  great  part  because  of  the  multiple  yield-affecting  interactions  between  crops, 
their  environment,  and  management  (Lawrence  et  al.,  2000).  This  complexity  may 
obscure  decision-makers'  understanding  of  some  cause-effect  relationships  in  agricultural 
fields,  such  as  the  link  between  applied  fertilizer  amounts  and  crop  yield. 

Farmers  frequently  seek  the  help  of  crop  consultants  and  extension  professionals 
for  understanding  the  consequences  of  different  management  options.  Farmers  are  also 
increasingly  adopting  information  technology  (Schmidt  et  al.,  1994),  using  personal 
computers  for  numerous  tasks  such  as  accounting,  managing  inventory,  researching 
prices,  checking  weather  forecasts,  reading  news,  buying  and  selling  products,  etc.  Large 
corporate  investments  in  web-delivered  services  and  content,  such  as  Farm  Assist 
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( www.farmassist.com)  also  suggest  a  growing  use  of  computers  for  decision  support  by 
farmers. 

Crop  consultants  and  extension  professionals  possess  a  great  deal  of  knowledge 
about  agricultural  systems,  but  when  the  system  is  complex  it  may  be  difficult  to  transmit 
that  knowledge,  and  evaluating  the  success  of  the  communication  process  may  not  be 
easy  (Lawrence  et  al.,  2000).  Many  scholars  and  various  active  learning  theories  have 
stressed  the  importance  of  experience  and  reflection  in  learning  and  practice  (Kotval, 
2003);  however,  a  thorough  hands-on  field  exploration  of  the  agricultural  management 
options  available  to  specific  farmers  is  not  cost-effective. 

It  would  be  desirable  to  have  a  computer-assisted  medium  through  which  the  crop 
consultant  /  extension  professional  and  the  farmer  could  discuss  cause-effect  relationships 
and  jointly  build  a  simple  model  of  the  agricultural  system  that  behaves  realistically 
while  preserving  a  clear,  easily  understood  representation  of  the  causal  relationships 
involved.  This  type  of  tool  should  be  able  to  represent  the  expert  knowledge  of  both  the 
consultant  and  the  farmer,  but  the  model  should  evolve  gradually,  keeping  pace  with  the 
discussion  process. 

The  objectives  of  this  paper  are  to  provide  an  overview  of  a  powerful  yet  simple 

probabilistic  causal  modeling  technology,  to  discuss  how  it  can  be  used  to  help 

understand  cause-effect  relationships,  and  to  present  a  detailed  example,  a  method  for 

quantifying  probabilities,  and  an  online  source  of  additional  information. 

Bayesian  Nets  as  Simple  Expert  Systems  to  Help  Explain  and  Understand  How 

Things  Work 

Expert  knowledge  has  often  been  represented  and  used  to  make  inferences  by 
means  of  computer  programs  called  expert  systems.  Most  expert  systems  have  been  built 
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using  a  knowledge  base,  typically  consisting  of  a  large  set  of  "if-then"  rules  (Metaxiotis 
et  al.,  2002).  Building,  debugging,  modifying  and  maintaining  this  type  of  model  can  be 
very  complex,  and  is  not  easily  mastered  by  the  non-specialist  (Darwiche,  2000). 

Probability  theory  provides  an  alternative  technology  for  building  expert  systems: 
Bayesian  networks  or  BNs  (Pearl,  1988).  Formally,  a  Bayesian  network  can  be  defined  as 
"a  specification  of  a  joint  probability  distribution  of  several  variables  in  terms  of 
conditional  distributions  for  each  variable  "  (Nadkarni  and  Shenoy,  2001).  We  will 
clarify  the  meaning  of  this  definition  below. 

A  BN  model  is  represented  at  two  levels,  qualitative  and  quantitative.  Qualitatively, 
a  BN  is  an  acyclic  graph  (a  network  of  nodes  and  arcs  arranged  so  that  it  contains  no 
loops)  in  which  nodes  represent  variables  and  the  directed  arcs  linking  the  nodes  describe 
probabilistic  relationships  embedded  in  the  model  (Nadkarni  and  Shenoy,  2001).  Assume 
two  variables  X  and  Y,  each  with  a  set  of  possible  values  or  states  (its  state  space) 
consisting  of  mutually  exclusive  and  exhaustive  values  of  the  variable.  If  there  is  an  arc 
pointing  from  X  to  Y,  we  say  that  X  is  a  parent  of  Y.  Nodes  that  have  no  parents  are 
called  root  nodes. 

The  quantitative  component  of  a  BN  specifies  the  probabilities  for  the  root  and 
non-root  nodes.  Each  root  node  has  a  simple  distribution  of  probabilities  for  its  different 
states.  These  prior  probabilities  represent  existing  prior  knowledge  about  the  state  of  the 
variable.  The  probability  for  the  states  of  other  nodes  i.e.  those  with  parents,  may  depend 
on  the  state  of  each  parent,  and  must  be  specified  in  those  terms.  This  type  of  probability 
distribution  is  represented  with  a  conditional  probability  table,  examples  of  which  are 
shown  further  below. 
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The  joint  probability  distribution  in  the  BN  definition  above  is  the  end  result  of 
combining  the  conditional  probability  tables  with  the  prior  probabilities.  A  BN  is  the 
specification  or  "roadmap"  of  how  the  joint  probability  distribution  is  built. 

Bayesian  networks  can  be  used  to  make  inferences  about  system  behavior. 
Probabilistic  inference  refers  to  the  process  of  computing  the  probability  distributions  of 
a  set  of  variables  of  interest  after  obtaining  some  observations  of  other  variables  in  the 
model  and  propagating  that  information  through  the  network  (Nadkarni  and  Shenoy, 
2001).  There  are  two  possible  kinds  of  inferences  in  a  BN,  deductive  and  abductive.  In 
deductive  inference,  the  properties  of  effects  are  inferred  from  knowledge  about  their 
causes.  In  abductive  inference,  knowledge  about  the  effects  is  used  to  propose  the  most 
likely  distribution  of  the  causes. 

Bayesian  networks  or  their  qualitative  component,  causal  diagrams,  have  been  used 
primarily  in  medical  applications  (Sierra  et  al.,  2000)  such  as  epidemiological  modeling 
(Greenland  et  al.,  1999),  the  automated  discovery  of  adverse  drug  reactions  (Orre  et  al., 

2000)  ,  cardiac  diagnosis  (Nikovsky,  2000),  and  predicting  obesity  risk  (Bunn  et  al., 
1999).  Other  applications  include  data  mining  (Heckerman,  1997),  finance  (Gemela, 

2001)  ,  data  fusion  for  desertification  studies  (Stassopoulou  et  al.,  1998),  environmental 
impact  studies  (Marcot  et  al.,  2001),  and  enhancing  the  functionality  of  Microsoft 
software  (Helm,  1996).  Agricultural  applications  in  the  literature  include  predicting  crop 
yields  and  pest  effects  (Kristensen  and  Rasmussen,  2002),  yield  response  to  fungicides 
(Tari,  1996),  and  agricultural  image  processing  (Onyango  et  al.,  1997). 

There  are  several  programs  available  for  to  help  in  the  qualitative  and  quantitative 
phases  of  BN  development.  Software  such  as  Netica  (Norsys,  1998)  allows  a  user  to 
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graphically  edit  the  network's  qualitative  structure,  create  and  populate  the  probability 
tables,  and  perform  both  deductive  and  abductive  inference.  The  results  are  shown 
graphically  as  bar  charts. 

A  Simple  Example 

We  conducted  on-farm  research  on  an  agricultural  operation  near  Murray,  KY 
during  the  2000-2002  cropping  seasons.  Our  work  included  building  BNs  to  help 
understand  crop  yield  variability  over  space  and  time  for  decision-making  in  precision 
agriculture  (Morgan  and  Ess,  1997).  Working  with  an  NRCS  soil  scientist,  crop 
consultants  and  the  farmer,  we  built  several  Bayesian  networks  to  aid  in  discussing  and 
understanding  the  physical  and  biological  processes  that  were  taking  place  in  the  field. 

Our  example  focuses  on  one  of  the  processes  we  discussed:  surface  water  flow  over 
the  field  following  a  storm.  We  agreed  that  the  rate  of  runoff  water  flow  over  the  surface 
is  related,  among  other  things,  to  the  roughness  of  the  soil  surface.  We  made  a  simple 
model  of  soil  surface  roughness  through  discussion  with  the  local  experts,  and  used  it  to 
discuss  possible  management  alternatives. 

Obtaining  trustworthy  probability  estimates  from  experts  may  seem  very  difficult. 
However,  inference  results  are  more  affected  by  the  qualitative  structure  of  the  BN  than 
by  uncertainty  in  the  probability  estimates  (Darwiche  and  Goldszmidt,  1994).  It  is  more 
important  to  obtain  a  consistent  way  of  mapping  experts'  perceptions  to  probabilities  than 
it  is  to  ensure  that  the  probability  values  adhere  strictly  to  reality.  To  achieve  this 
consistency  we  used  (and  suggest)  a  modified  form  of  a  scale  proposed  by  Renooij  and 
Witteman  (1999)  shown  in  Figure  6-1. 
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Certain 

Probable 
Expected 

Fifty-fifty 

Uncertain 
Improbable 

Impossible 

Figure  6-1.  Scale  used  to  translate  between  verbal  quantifiers  and  probabilities  (expressed 
as  percentages).  Adapted  from  Renooij  and  Witteman  (1999). 

Deductive  Inference 

Figure  6-2  shows  a  simple  BN  we  built  using  Netica  v.  1.12  software  (Norsys, 
1998),  to  predict  soil  surface  roughness  based  on  crop  residue  accumulation.  Each  box 
represents  a  variable  and  shows  the  possible  states  the  variable  can  take  (a  design 
decision  made  during  the  discussion  process)  together  with  their  corresponding 
probabilities  in  bar  chart  form.  The  root  nodes  (independent  variables)  are  Tillage  (the 
type  of  tillage,  Minimum  till  or  No  till),  Lastcrop  (the  crop  previously  grown  in  the 
field:  soybeans  or  corn),  Yieldoflastcrop  (how  much  that  crop  yielded:  low,  medium 
or  high),  and  Time_of_the_year  (the  time  of  interest:  Pre-planting,  Crop  growth  season, 
and  Post-harvest). 

After  identifying  the  root  nodes,  we  used  Last  crop  and  Yield  of  last  crop  as 
parents  of  Plant  material  (how  much  plant  material  was  produced),  which  we  used 
together  with  Tillage  to  condition  Residue  (amount  of  residue  available  at  the  surface), 
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which  was  used  in  turn  with  Time  of  the  year  to  condition  Roughness,  our  dependent 
variable  of  interest.  Note  how  the  arrows  in  Figure  6-2  lead  from  causes  to  effects. 

After  the  qualitative  definition  we  quantified  probabilities.  We  set  up  the  individual 
conditional  probability  tables,  and  then  used  the  network  to  make  inferences,  exploring 
the  network's  results  for  different  inputs.  If  results  were  unexpected  or  unreasonable,  we 
re-examined  and  discussed  the  conditional  probability  tables,  making  adjustments  if 
necessary,  testing  again,  etc. 


Tillage 

Minimum  till  25.0 
No  till  75.0 

Last_Crop 

Soybean  50.0 
Com  50.0 

Ti  m  e_of_the_yea  r 

Preplanting  33.0 
Crop  growth  34.0 
Post  harvest  33.0 

Yi  e  1  d_of_l  a  st_crop 

Low  25.0 
Medium  50.0 
High  25.0 

/ 


Plant_material 

Low  33.8 
Medium  40.6 
High  25.6 

^  i 

Residue 

Low  19.6 
Medium  36.9 
High  43.5 

Roughness 


Very  Low  5.10 

Low  12.6 

Medium  25.6 

High  28.0 

Very  High  28.7 


Figure  6-2.  Causal  model  of  soil  roughness  made  using  a  Bayesian  network. 


In  Figure  6-2,  the  prior  probabilities  of  the  root  nodes  shown  in  the  bar  charts 
reflect  the  experts'  initial  ideas  regarding  the  variables'  probability  distributions.  The 
variables  that  do  have  parents  i.e.  Plant  material,  Residue,  and  Roughness,  have 
conditional  probability  tables  (Figure  6-3);  the  probabilities  that  are  shown  in  these 
nodes'  bar  charts  are  the  posterior  probabilities  i.e.  the  probabilities  calculated  using  the 
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probability  tables  and  knowledge  about  the  state  (or  probability  distribution)  of  the  parent 
variables. 


A  (Plant  Material): 


B  (Residue): 


C  (Roughness): 


Yield_of_l...  Last_Crop 

Low         Medium  High 

Low  Soybean 

75.000        25.000  0.000 

Low  Corn 

50.000        25.000  25.000 

Medium  Soybean 

35.000        50.000  15.000 

Medium  Corn 

25.000        50.000  25.000 

High  Soybean 

25.000        50.000  25.000 

High  Corn 

0.000        25.000  75.000 

Plant  mat...  Tillage 

Low         Medium  High 

Low  Minimumjill 

90.000        10.000  0.000 

Low  Nojill 

25.000        50.000  25.000 

Medium  Minimumjill 

50.000        30.000  20.000 

Medium  Nojill 

0.000        50.000  50.000 

High  Minimumjill 

10.000        50.000  40.000 

High  Nojill 

0.000        10.000  90.000 

Time  of  t... 

Residue 

Veiyjow 

Low 

Medium 

High 

VeiyJHi... 

Preplanting 

Low 

20.000 

35 

000 

40.000 

5.000 

0.000 

Preplanting 

Medium 

10.000 

20 

000 

35.000 

25.000 

10.000 

Preplanting 

High 

0.000 

5 

000 

25.000 

40.000 

30.000 

Crop_growth 

Low 

20.000 

35 

000 

40.000 

5.000 

0.000 

Crop_growth 

Medium 

10.000 

20 

000 

35.000 

25.000 

10.000 

Crop_growth 

High 

0.000 

5 

000 

25.000 

40.000 

30.000 

Postjiarvest 

Low 

0.000 

25 

000 

40.000 

30.000 

5.000 

Postjiarvest 

Medium 

0.000 

0 

000 

15.000 

50.000 

35.000 

Postjiarvest 

High 

0.000 

0 

000 

0.000 

10.000 

90.000 

Figure  6-3.  Conditional  probability  tables  for  the  soil  roughness  model:  panels  A,  B,  and 
C  are  for  the  Plantmaterial,  Residue,  and  Roughness  nodes,  respectively. 


Figure  6-3  shows  the  conditional  probability  tables  defined  for  the  model:  6-3 A  is 
for  the  Plant  material  node,  6-3  B  is  for  the  Residue  node,  and  6-3  C  is  for  the  Roughness 
node.  Let  us  examine  Figure  6-3A  in  detail;  the  two  left  columns  show  the  different 
combinations  of  states  of  the  parent  nodes,  and  the  three  columns  on  the  right  show  the 
probability  of  each  possible  state  of  the  Plant  material  node  given  the  states  of  its  parents 
shown  at  left.  For  example,  if  the  antecedent  crop  was  soybean  and  its  yield  was  low, 
there  is  a  75%  probability  that  Plant  material  was  Low,  25%  that  it  was  Medium,  and  0% 
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that  it  was  High.  If  the  antecedent  crop  was  corn  and  its  yield  was  Medium,  the 
probabilities  would  have  been  25%,  50%,  and  25%. 

Careful  qualitative  modeling  results  in  conditional  probability  tables  that  are  easy 
to  understand  and,  consequently,  easy  to  populate  with  probabilities  elicited  from  the 
discussion  process.  Note  how  the  column  headers  of  Figure  6-3  correspond  to  simple, 
easily  understood  verbal  quantifiers:  "Low",  "Medium",  "High",  etc.  Using  these 
quantifiers  is  convenient  and  can  simplify  interaction  with  farmers  and  other  domain 
experts,  but  consistency  must  be  maintained  at  all  times. 

Conditional  independence  (Jensen,  2001)  is  an  important  aspect  of  Bayesian 
networks.  Two  variables  A  and  C  are  conditionally  independent  given  variable  B  if 
P(A  |  B)  =  P(A  |  B,  C),  i.e.  if  given  the  state  of  B,  knowing  the  state  of  C  does  not  affect 
our  probability  of  knowing  the  state  of  A.  It  is  assumed  that  BN  variables  having  the 
same  parent  are  conditionally  independent.  This  assumption  is  not  important  when 
experimenting  with  small  networks  as  discussion  support  tools,  but  violating  it  may  lead 
to  incorrect  inferences,  especially  in  large  networks.  Introducing  additional  variables  to 
simplify  the  network  topology  can  help  maintain  conditional  independence  (Kwoh  and 
Gillies,  1996). 

The  structure  of  the  BN  shown  in  Figure  6-2  is  very  simple;  we  deliberately  kept 
the  number  of  parents  and  children  of  each  node  to  a  minimum.  This  resulted  in  easily 
understood  variables  and  also  minimized  the  size  of  the  conditional  probability  tables. 
Experts  also  have  an  easier  task  estimating  probabilities  conditioned  on  two  variables 
(e.g.  roughness  conditioned  on  Time_of_the_year  and  Residue)  than  conditioned  on  four; 
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compare  Figure  6-2  with  a  network  in  which  Roughness  directly  had  the  four  root 
variables  as  its  parents. 

As  shown,  the  bar  chart  of  the  Roughness  box  of  Figure  6-2  represents  the 
probability  distribution  of  the  Roughness  variable  given  no  prior  knowledge  about  the 
specific  state  of  the  independent  variables  except  their  probability  distributions.  Variants 
including  prior  knowledge  are  shown  in  Figures  6-4  to  6-6. 


Tillage 

Minimum  till  100 
No  till  0 

Last_Crop 


Soybean  100 
Com  0 


Ti  m  e_of_the_yea  r 

Preplanting  0 
Crop  growth  0 
Post  harvest  100 

Plant_material 

Low  35.0 
Medium  50.0 
High  15.0 

Residue 

Low  58.0 
Medium  26.0 
High  16.0 

Yield_of_last_crop 

Low  0 

Medium  100 
High  0 

/ 

Roughness 

Very  Low  0 

Low  14.5 

Medium  27.1 

High  32.0 

Very  High  26.4 

Figure  6-4.  Deductive  inference  on  a  Bayesian  network.  Note  how  some  variables  (shown 
in  a  darkened  box)  are  forced  to  known  states,  and  how  the  network  makes  a 
deductive  inference  about  the  distribution  of  the  Roughness  variable 
conditioned  on  the  forced  inputs. 


Figure  6-4  shows  how  the  probability  distribution  of  Roughness  changes  when  the 
user  specifies  additional  prior  knowledge,  by  forcing  (through  a  mouse  click)  the 
independent  variables  to  known  states.  Note  how  these  forced  variables  (which  now 
represent  input  data)  appear  darkened  in  the  figure.  In  Figure  6-4  the  specified  states  are 
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Minimum  till,  Soybean,  and  a  Medium  yield.  The  model  predicts  that  in  the  post-harvest 
period  Roughness  will  have  a  58.4%  probability  of  being  high  or  very  high,  and  a  14.5% 
probability  of  being  low  or  very  low.  Figure  6-5  differs  from  6-4  in  that  the  tillage  has 
been  changed  to  no  till.  Note  how  the  expected  amount  of  residue  increases  as  a  result  of 
a  more  residue-friendly  management,  and  the  distribution  of  Roughness  changes 
accordingly:  now  the  probability  of  high  or  very  high  is  87.8%,  and  of  low  or  very  low 
Roughness  only  2.19%. 


Tillage 

Minimum  till  0 
No  till  100 

Last_Cro 

P 

Soybean  100 
Corn  0 

Time_of_the_year 

Preplanting  0 
Crop  growth  0 
Post  harvest  100 

\  I  High 


Yield_of_iast_crop 

Low  0 
Medium  100 

High  0 

Plant_material 

Low  35.0 
Medium  50.0 
High  15.0 

Residue 

Low  8.75 
Medium  44.0 
High  47.2 

■ 

Roughness 

Very  Low  0 
Low  2.19 
Medium  10.1 
High  29.4 
Very  High  58.4 

■ 

Figure  6-5.  Deductive  inference  in  the  Bayesian  network.  Roughness  tends  to  increase 
with  respect  to  Figure  6-4  because  Tillage  is  now  in  the  No  till  state. 


Figure  6-6  differs  from  6-5  in  that  the  antecedent  crop  is  now  corn,  which  produces 
more  biomass  (Plant  material)  than  soybeans.  Consequently,  the  corresponding  posterior 
probabilities  in  the  child  nodes  change:  the  probability  for  very  high  Roughness  is  now 
62.7%,  versus  58.4%  in  Figure  6-5. 
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Abductive  Inference 

Figure  6-7  shows  an  example  of  abductive  inference:  the  effect,  Roughness,  is 
known  (forced  to  a  given  distribution)  and  the  BN  is  used  to  make  inferences  about  the 
probability  distribution  of  a  cause,  Last  Crop.  The  arrow  directions  do  not  change  with 
respect  to  Figures  6-2  to  6-6  because  the  causal  relationships  assumed  when  defining  the 
model  are  unchanged;  only  the  way  in  which  the  network  is  being  used  is  different. 

Abductive  inference  is  powerful;  it  is  often  used  in  medical  and  other  diagnostic 
BN  applications,  and  sets  Bayesian  networks  apart  from  regular  expert  systems:  BNs  can 
be  used  either  forward  or  backward,  whereas  rule-based  expert  systems  usually  only 
make  inferences  in  one  direction  because  if-then  rules  cannot  be  easily  inverted: 
"if  A  then  B"  is  not  equivalent  to  "if  B  then  A"! 


Tillage 

Minimum  till  0 
No  till  100 

Last_Crop 

Soybean  0 
Com  100 

Time_of_the_year 

Preplanting  0 
Crop  growth  0 

Post  harvest  100 

Yield_of_last_crop 

Low  0 
Medium  100 
High  0 

Plant_material 

Low  25.0 
Medium  50.0 
High  25.0 

>  1 

Residue 

Low  6.25 
Medium  40.0 
High  53.7 

■ 

Roughness 

Very  Low  0 
Low  1.56 
Medium  8.50 
High  27.2 
Very  High  62.7 

■ 

Figure  6-6.  Deductive  inference  in  the  Bayesian  network.  Roughness  increases  because 
the  corn  crop  produces  more  residue  than  soybeans. 
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The  model  developed  in  this  example  is  imperfect,  a  gross  simplification  of  the  real 
natural  system.  Crop  residue  accumulation  is  dependent  on  numerous  factors  not 
considered  here,  such  as  soil  texture,  landscape  position,  weather,  etc.  These  factors  came 
up  in  the  discussions  once  the  basic  structure  of  the  network  had  been  set  up;  the  group 
discussed  their  influence  and  the  convenience  of  including  them  in  the  model.  Adding 
additional  variables  is  simple  given  the  visual  network  editing  tools  included  in  Netica 
and  similar  programs,  so  new  iterations  of  the  design  process  could  have  followed, 
adding  variables,  creating  new  conditional  probability  tables,  and  testing  the  results. 


Tillage 

Minimum  till  0 
No  till  100 

Last_Crop 

Soybean  73.9 
Corn  26.1 

Ti  me_of_the_yea  r 

Preplanting  0 
Crop  growth  0 
Post  harvest  100 

Plant_material 

Low  38.0 
Medium  48.9 
High  13.0 

>  1 

Residue 

Low  21.7 
Medium  78.3 
High  0 

Yield_of_last_crop 

Low  0 
Medium  0 
High  100 

/ 

Roughness 

Very  Low  0 
Low  0 
Medium  100 
High  0 
Very  High  0 

Figure  6-7.  Abductive  inference  in  the  Bayesian  network;  note  how  Roughness  is  known, 
but  Last  crop  is  not. 
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Conclusions 

Bayesian  networks  provide  a  powerful  tool  for  discussing  complex  concepts.  With 
some  practice,  crop  consultants,  extension  professionals,  and  clients  with  minimal 
experience  using  personal  computers  can  easily  understand  the  probabilistic  ideas  behind 
Bayesian  networks,  as  well  as  the  use  of  interactive  software  tools  for  their  construction. 
Effective  model  definition  improves  with  experience;  the  crop  consultants  /  extension 
professionals  with  which  we  interacted  rapidly  became  skilled  enough  to  define  and 
populate  simple  models  in  1-2  hours. 

Tools  for  Bayesian  network  modeling  are  readily  available  on  the  market.  For 
example,  a  powerful  trial  version  of  the  Netica  software  used  for  this  study  can  be 
downloaded  from  the  manufacturer's  website  (www.norsys.com). 

Finally,  the  techniques  briefly  described  here  are  not  limited  to  the  discussion  of 
agricultural  systems  management.  Any  extension  activity  that  requires  discussion  of 
cause-effect  relationships,  such  as  health  care,  safety,  and  mechanics-related  topics,  can 
benefit  from  Bayesian  network  -  supported  dialogue. 


CHAPTER  7 

INTEGRATING  MULTIPLE  KNOWLEDGE  SOURCES  FOR  PARAMETERIZED 
SPATIAL  CROP  MODELS  WITH  INVERSE  MODELING 

Introduction 

In  Chapter  1  we  discussed  how  the  development  of  a  practical  tool  for  the  accurate 
simulation  of  spatiotemporal  crop  yield  variability  (henceforth,  a  spatial  crop  model  or 
SCM)  could  provide  a  valuable  analytical  and  decision-support  tool  for  precision 
agriculture.  In  Chapter  2  we  showed  different  sources  of  error  in  SCMs:  model  error 
(spatially-coupled  vs.  uncoupled  forms  of  SCM),  initial  conditions,  and  parameter  error. 

Crop  model  results  are  indeed  sensitive  to  their  parameters,  some  of  which  affect 
model  outputs  significantly  more  than  others  (Favis-Mortlock  and  Smith,  1990; 
Leenhardt  et  al.,  1994).  When  the  inputs  to  which  crop  models  are  very  sensitive  are 
measured  carefully,  model  predictions  can  be  very  accurate  (Braga,  2000).  However,  in  a 
precision  agriculture  context,  input  parameters  usually  cannot  be  measured  throughout 
the  field,  and  estimation  procedures  must  be  used. 

A  popular,  open-loop  method  of  estimating  soil  parameters  involves  pedotransfer 
functions  (Bouma,  1989;  Wosten  et  al.,  2001),  which  are  used  to  estimate  critical  but 
unavailable  variables  such  as  soil  water  holding  limits  and  saturated  hydraulic 
conductivity  from  more  readily  available  data.  These  functions  are  typically  implemented 
with  regression  models  (Gupta  and  Larson,  1979;  Rawls  et  al.,  1982;  Saxton  et  al.,  1986), 
although  other  approaches  are  being  increasingly  used,  including  fractals  (Gimenez  et  al., 
1997;  Perrier  at  al,  1996;  Rawls  and  Brakensiek,  1995),  classification  and  regression  trees 
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(Rawls  and  Pachepsky,  2002),  and  neural  networks  (Minasny  et  al.,  1999;  Schaap  and 
Leij,  1998;  Schaap  et  al.,  1998). 

The  data  used  as  pedotransfer  function  inputs  are  usually  textural  fractions,  organic 
matter  content,  and  bulk  density,  although  recent  research  has  sought  to  improve  the 
predictive  quality  of  pedotransfer  functions  by  incorporating  other  data  such  as  soil 
structure  (Pachepsky  and  Rawls,  2003)  and  topographic  variables  (Pachepsky  et  al, 
2001;  Rawls  and  Pachepsky,  2002). 

Many  of  these  input  variables  are  not  always  available,  especially  at  the  spatial 
sampling  density  needed  in  a  precision  agriculture  context.  In  many  practical  cases,  the 
only  existing  data  source  is  the  soil  survey  (SS),  which  associates  the  description  of  a 
"representative"  soil  profile  to  a  mapping  unit  that  may  represent  a  large  fraction  of  the 
field.  The  description  is  typically  expressed  in  terms  of  textural  class  per  soil  horizon. 

Applying  the  pedotransfer  function  approach  to  SS  data  and  assigning  the  estimated 
soil  parameters  to  an  entire  SS  mapping  unit  may  introduce  errors  into  SCM  results  due 
to  a)  the  spatial  heterogeneity  and  scale-dependent  behavior  of  actual  spatially  distributed 
soil  properties  vs.  their  representation  by  a  single  lumped  value  per  grid  cell,  obtained 
from  laboratory  measurements  performed  on  soil  cores  (Brakensiek  et  al.,  1981 ;  Gijsman 
et  al.,  2002);  b)  biases  and  errors  specific  to  the  chosen  pedotransfer  function  (Gijsman  et 
al.,  2002);  and  c)  errors  due  to  imprecise  soil  type  delimitation  throughout  the  landscape. 
Thus,  the  open-loop  approach  described  above  may  be  inappropriate  in  practical  SCM 
applications  in  precision  agriculture. 

An  alternative,  closed-loop  approach  is  to  abduct  parameter  values  with  an  inverse 
modeling  (IM)  scheme.  Some  form  of  optimization  algorithm  such  as  simulated 
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annealing  (Kirkpatrick  et  al.,  1983;  Metropolis  et  al,  1953)  is  used  to  propose  an  optimal 
combination  of  parameters  for  each  cell  that  minimizes  an  objective  function  such  as  the 
RMSE  between  simulated  and  observed  crop  yield.  Paz  et  al.  (2001)  and  Irmak  et  al. 
(2001)  showed  examples  of  this  approach.  However,  in  these  studies  the  link  between  the 
parameter  estimates  and  reality  was  the  optimal  match  between  simulated  and  observed 
yield  across  two  or  three  crop  seasons.  The  effect  of  yield-modifying  processes  not 
contemplated  by  the  model  (such  as  topographic  water  redistribution)  could  thus  be 
spuriously  explained  by  the  IM  process  through  the  model  parameters,  as  shown  in 
Chapter  2.  Balancing  the  need  to  identify  and  represent  these  processes  in  the  SCM  with 
practical  limitations  in  runtime  and  data  collection  requirements  poses  a  great  challenge 
for  spatial  crop  modeling. 

Other  data  assimilation  techniques,  such  as  Kalman  filtering  (Wikle  and  Cressie, 
1999)  and  Bayesian  parameter  estimation  (Omlin  and  Reichert,  1999;  Qian  et  al.,  2003), 
have  been  applied  to  the  spatiotemporal  modeling  of  environmental  systems,  especially  in 
meteorologic  and  oceanographic  studies  (Dowd  and  Meyer,  2003;  Natvik  et  al.,  2001; 
Vallino,  2000).  Bayesian  methods  have  also  been  applied  to  the  parameterization  of 
agronomic  and  hydrologic  models  (Durand  et  al,  2002;  Makowski  et  al,  2002;  Thiemann 
et  al.,  2001).  These  methods  make  use  of  existing  a  priori  data  and  observations  to  update 
the  model's  parameters  and/or  state. 

For  SCMs  to  be  practical  as  decision-support  tools  in  industry,  crop  consultants 
must  be  able  to  parameterize  them  easily  and  rapidly  (J.R.  Murdock,  pers.  comm.), 
having  a  minimum  of  quantitative  information  regarding  model  parameters  and  their 
probability  distributions.  This  may  preclude  the  use  of  complex  spatially-coupled  models 
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parameterized  by  inverse  modeling,  due  to  the  computational  complexity  of  solving  the 
underlying  combinatorial  optimization  problem. 

Soil  properties  frequently  exhibit  spatial  associations  (Goovaerts,  1 997),  although 
IM  schemes  used  to  date  for  soil  parameter  estimation  in  SCMs  have  ignored  this  spatial 
context  except  by  considering  landscape  position  as  a  way  of  selecting  parameters  to 
optimize  (Irmak  et  al.  2001).  This  spatial  context  can  operate  at  the  parameter  level 
rather  than  the  process  level.  Instead  of  the  aforementioned  tightly  coupled  landscape 
cells  that  interchange  water  and  nutrients  on  a  daily  basis  in  a  spatial  crop  model,  crop 
simulation  could  be  performed  in  individual  cells  as  per  the  approach  of  Paz  et  al.  (2001); 
the  spatial  context  could  work  by  subjecting  the  parameters  to  a  number  of  quantitative  or 
qualitative  constraints.  Such  a  scheme  could  have  lighter  computational  and  quantitative- 
data-requirement  burdens  than  other  aforementioned  data  assimilation  techniques.  We 
called  this  parameter-level  model  spatial  parameter  model  (SPM). 

Expert  opinion  is  an  important  source  of  information  regarding  agricultural  crops' 
spatiotemporal  behavior.  Farmers,  crop  consultants,  and  soil  scientists  possess  a  wealth  of 
knowledge  about  agricultural  systems  that  can  be  harnessed  to  provide  parameter 
constraints.  However,  much  of  an  expert's  knowledge  is  in  qualitative  form,  and, 
irrespective  of  its  qualitative  or  quantitative  nature,  the  expert  knowledge  must  be  elicited 
from  its  owner,  who  may  be  unable  to  formalize  it  without  external  help.  There  exist 
numerous  techniques  for  this  knowledge  elicitation  (Diaper,  1989).  One  such  technique  is 
Bayesian  networks  (Pearl,  1988),  the  use  of  which  was  exemplified  in  Chapter  6. 

We  hypothesized  that,  using  practically  available  data  and  expert  opinion,  the 
spatial  context  in  an  agricultural  field  can  be  modeled  and  used  to  constrain  the  parameter 
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estimation  process  of  a  SCM  in  a  way  that  improves  its  predictive  ability  while 
preserving  its  compatibility  with  practical  applications.  In  this  study  we  developed 
methods  that  will  allow  testing  of  this  hypothesis. 

The  objectives  of  this  study  were  to  qualitatively  and  quantitatively  define  a  spatial 
parameter  model  (SPM)  using  available  sources  of  knowledge,  to  apply  it  to  the 
parameterization  by  IM  of  a  simple  spatial  crop  model  (SCM),  and  to  evaluate  the  SCM 
in  a  representative  case  study,  including  a  comparison  with  the  spatially-coupled  model 
developed  in  Chapter  2. 

Materials  and  Methods 

This  section  is  structured  in  four  parts: 

•  Description  of  the  case  study,  including  the  data  available  for  supporting  the 
construction  of  the  SPM. 

•  Description  of  the  parameter  estimation  process,  including  the  IM  framework,  the 
SPM  and  the  criteria  used  therein,  and  how  those  criteria  were  populated  with  data. 

•  Description  of  the  uncoupled  SCM  (used  with  the  IM  framework)  and  of  the 
spatially-coupled  SCM  (derived  from  Chapter  2  and  used  for  comparison  with  the 
IM  framework  results);  description  of  the  data  used  to  drive  them. 

•  Description  of  the  analyses  we  performed  on  the  data. 
Case  study:  the  Suggs  4  Field 

We  studied  a  field  near  Murray,  Kentucky:  Suggs  4  (36°  32.3'  N,  88°  27.5'  W,  elev. 
222  m).  This  field  is  part  of  Ponderosa  Farms,  owned  and  farmed  by  Rick  Murdock.  This 
field  has  been  managed  using  no-till  and  approximately  the  same  (two  year)  rotation 
scheme  for  over  two  decades.  The  rotation  consists  of  maize  {Zea  mays  L.)  grown  on  the 
field  on  the  first  year  (planted  around  April  10),  and  wheat  (Triticum  aestivum  L.)  and 
soybeans  (Glycine  max  L.  Merr.)  grown  the  second  year. 
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Elevation  and  topographic  attributes 

We  used  GPS-measured  elevation  to  describe  the  landscape.  AgConnections,  Inc., 
surveyed  the  Suggs  4  field  in  November  of  2000  using  a  pair  of  survey-grade  real-time 
kinematic  (RTK)  GPS  units.  One  of  these  was  used  as  a  base  station  and  the  other  was 
mounted  on  an  all-terrain  vehicle. 

The  RTK  approach  is  recognized  as  a  very  accurate  GPS-based  technique  for 
obtaining  elevation  data  for  use  in  site-specific  agriculture  (Renschler  et  al.,  2002).  Clark 
and  Lee  (1998)  obtained  vertical  errors  of  4-9  cm  using  this  technique.  However,  Wilson 
et  al.  (1998)  warned  that  relatively  small  elevation  errors  across  different  landscape 
positions  could  result  in  large  variations  of  hydrologically  relevant  topographical 
attributes  calculated  from  those  elevation  data,  and  recommended  using  the  maximum 
practical  sampling  density. 

Consistent  with  this  prescription,  the  data  used  for  this  study  were  collected  on-the- 
go  with  the  vehicle-mounted  GPS  unit,  on  15-meter  swaths.  There  were  7030  sampling 
locations,  shown  in  Figure  7-1.  The  gaps  shown  were  caused  by  problems  in  the 
measurement  of  soil  electroconductivity  (EC).  This  variable  was  measured  concurrently 
with  the  RTK  elevation  measurements.  When  an  EC  measurement  was  not  valid, 
typically  due  to  wheat  residue  collecting  on  the  electrodes,  the  whole  record  (including 
elevation)  corresponding  to  that  measurement  was  lost. 

We  used  GS+  for  Windows,  version  5.0  (Gamma  Design  Software,  2001)  to 
estimate  a  semivariogram  for  the  elevation  data.  We  then  used  ordinary  point  kriging  to 
resample  the  original  data  onto  a  10-meter  isometric  grid.  These  data  were  in  turn  used  to 
calculate  a  wetness  index,  or  WI  (Beven  and  Kirkby,  1979),  using  a  topographic  index 
calculation  utility  distributed  with  TOPMODEL  (Beven  et  al.,  1995). 
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Figure  7-1 :  Suggs  4  field.  The  dots  show  the  sampling  locations  for  RTK  elevation  and 
Veris  electroconductivity  data. 

Soil  electroconductivity 

Soil  electroconductivity  (EC)  has  been  used  in  several  studies  to  predict  soil 
properties  (Johnson  et  al.,  2001).  A  set  of  EC  data  was  collected  by  AgConnections,  Inc. 
concurrently  with  the  elevation  data  set  in  the  winter  of  2000.  The  equipment  used  was  a 
Veris  5600.  This  device  has  two  sets  of  coulter-shaped  electrodes  that  produce  two 
different  attributes  for  each  sample  location:  the  "surface"  measurement,  that  is  meant  to 
represent  soil  electroconductivity  from  0  to  60  cm,  and  the  "deep"measurement,  which  is 
representative  of  the  characteristics  to  a  depth  of  1 50  cm. 

We  used  GS+  5.0  to  estimate  a  semivariograms  for  the  EC  data.  We  then  used 
ordinary  block  kriging  to  resample  the  two  original  data  layers  onto  a  10-meter  isometric 
grid.  These  data  were  later  used  to  assist  discussion  with  domain  experts. 
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Soil  data 

Soils  in  the  region  were  formed  in  loess  approximately  120  cm  thick,  and  tend  to 
have  a  fragipan  at  a  landscape-position-dependent  depth  (USDA  SCS,  1973).  The 
fragipan  limits  root  growth  and  causes  perched  water  tables  during  the  late  winter  and 
early  spring,  when  there  is  steady  precipitation,  low  evapotranspiration  due  to  low 
temperatures  and  solar  radiation,  and  low  (or  inexistent)  plant  water  demand. 


Figure  7-2:  Original  division  of  the  Suggs  4  field  into  soil  types,  adapted  from  soil  survey 
data  (USDA  SCS,  1973).  Each  oval  identifies  its  corresponding  map  unit.  The 
parenthesized  elements  set  apart  map  units  having  the  same  soil  type. 
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The  field  tends  to  slope  downward  from  northwest  to  southeast,  with  an  elevation 
range  of  approximately  five  meters.  The  soil  types  in  the  field  (Figure  7-2)  generally 
correspond  to  different  landscape  positions  (USDA  SCS,  1973).  The  ridgetop  contains 
Loring  B  soils;  the  well-eroded  slope  on  the  northwest  border  of  the  field  has  Loring  C2. 
These  soils  are  described  as  being  moderately  well  drained,  somewhat  eroded,  and  having 
a  fragipan.  The  sideslopes  have  Granada  soils.  These  are  relatively  deep  and  moderately 
well-drained  soils,  somewhat  eroded,  but  considered  the  best  soils  in  the  region  by  our 
domain  experts.  Fairly  shallow  and  fairly  poorly  drained  Calloway  soils  occupy  lower 
positions.  This  sequence  of  soils  is  typical  of  the  area,  which  corresponds  to  the  Grenada- 
Calloway  Association  (USDA  SCS,  1973). 

Yield  maps 

Maize  yield  data  were  available  for  the  1997,  1999,  and  2001  cropping  seasons. 
Soybean  data  were  available  for  1998  also.  The  farmer  collected  yield  maps  using  a 
scale-calibrated  yield  monitor.  In  1997-1999  he  used  a  GreenStar  yield  monitor;  in  2001 
he  used  an  Ag  Leader  3000.  Yield  map  lag  times  were  corrected  using  the  yield  import 
tool  found  in  AgLink  Professional,  version  5.5  (AGRIS  Corporation,  1998).  We  used 
GS+  5.0  to  estimate  a  semivariogram  for  each  yield  map.  We  then  used  ordinary  block 
kriging  to  resample  the  lag-corrected  data  onto  a  1 0-meter  isometric  grid. 

These  yield  maps  were  used  as  the  observed  reference  against  which  to  compare 
model  simulations,  and  also  to  generate  a  multi-year,  aggregated  yield  product  called  a 
normalized  yield  (NY)  map.  This  map  was  obtained  by  registering  the  different  maps  to  a 
common  grid,  expressing  the  yield  of  each  map's  cells  as  a  percentage  of  the 
corresponding  year's  average  yield  for  the  field,  and  averaging  the  co-registered  cell 
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values  across  years.  The  NY  map  was  built  using  the  1997  and  1999  maize  and  the  1998 
soybean  data,  and  was  used  together  with  individual  year  yield  maps  to  assist  discussion 
with  domain  experts. 
Soil  water  data 

We  collected  a  spatiotemporal  soil  water  content  dataset.  It  was  used  for  evaluating 
the  parameter  sets  obtained  in  the  parameterization  process  rather  than  for 
parameterization  proper.  The  data  were  collected  in  the  field  on  1 1  dates  between  March 
27  and  September  17,  2001,  enveloping  the  maize  cropping  season.  We  used  a  Trime 
FM3  time  domain  reflectometry  (TDR)  system  with  plastic  access  tubes.  Data  were 
measured  at  15-cm  depth  intervals  at  each  location.  The  measurements  for  the  top  depth 
range,  0-15  cm,  were  taken  with  a  three-prong  FM1  electrode,  whereas  the  deeper 
measurements  were  taken  using  access  tubes.  The  measurements  were  replicated  three 
times. 

When  the  soil  dried,  it  became  so  hard  and  restrictive  that  inserting  the  three-prong 
electrode  into  the  soil  was  not  possible  without  danger  of  damaging  the  device.  We 
solved  this  problem  with  a  custom-built  pre-drilling  device  (Figure  7-3)  that  used 
sharpened  stainless  steel  bars  of  similar  diameter  and  identical  spacing  to  that  of  the  FM1 
electrodes  to  pre-drill  holes  into  which  we  inserted  the  TDR  electrodes. 

The  tubes  were  installed  according  to  the  recommended  practice  of  the 
manufacturer,  iteratively  a)  inserting  the  tube  into  the  ground  a  few  inches  by  using  a 
mallet  (or  a  modified  post  driver,  in  the  higher-clay  soils)  to  beat  a  ramming  head  affixed 
both  to  the  tube  and  to  a  pipe  running  the  length  of  the  tube  (to  rest  on  the  metal  cutting 
edge  at  the  bottom  of  the  tube),  and  b)  using  an  auger  to  extract  the  material  collected 
inside  the  pipe. 


Parameter  Estimation  Process 
Knowledge  elicitation 

Integrating  multiple  sources  of  knowledge  presents  general  problems  such  as 
putting  all  the  knowledge  in  a  form  that  can  be  used  in  a  common  framework.  Eliciting 
complex  biophysical  information  from  precision  farming  domain  experts  also  poses  a  set 
of  problems,  not  least  among  them  the  need  to  develop  a  common  language  between  the 
crop  system  modeler  and  the  experts. 

I  obtained  the  cooperation  of  several  domain  experts:  Rick  Murdock  (farmer  and 
crop  consultant),  Pete  Clark  (crop  consultant),  John  Potts  (Agronomist),  all  with 
AgConnections,  Inc.,  and  Jerry  Mcintosh,  a  soil  scientist  with  the  NRCS  (U.S.  Natural 
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Resources  Conservation  Service).  I  made  four  weeklong  visits  to  the  headquarters  of 
AgConnections  during  March,  May,  July,  and  August  2002,  meeting  with  the  experts  at 
least  twice  per  visit,  for  several  hours  at  a  time.  The  knowledge  elicitation  activities 
performed  during  these  meetings  included  the  following: 

•  Delimiting  and  qualitatively  modeling  the  physical  system  of  interest  using  causal 
diagrams.  The  process  included  whiteboard  discussions  and  interactive  work  using 
Netica  Bayesian  network  software  version  1.12  (Norsys,  1998)  and  Microsoft 
PowerPoint  presentation  software  (Microsoft  Corp.,  1999). 

•  Identifying  relevant  parameters.  This  diagram  was  built  collectively  with  the 
domain  experts  and  provided  an  effective  way  of  explaining  and  discussing  how  a 
crop  model  operates,  and  of  intuitively  communicating  ideas  such  as  the  link 
between  limiting  factors  and  parameter  sensitivity. 

•  Adopting  a  system  for  translating  between  qualitative  and  quantitative  probability 
scales.  We  adopted  a  slightly  modified  version  of  the  scale  proposed  by  Renooij 
and  Witteman  (1999). 

•  Eliciting  probabilities  about  spatial  parameter  relationships,  expected  sensitivity  of 
the  crop  model  to  specific  parameters,  and  other  quantitative  questions  about  the 
system. 

To  maximize  the  reliability  of  the  causal  diagrams  arising  from  our  process,  we 
strove  to  achieve  consensus  among  the  multiple  experts  for  both  the  qualitative  and 
quantitative  aspects  of  the  models,  following  Nadkarni  and  Shenoy  (2001),  and  revisited 
the  diagrams  during  successive  visits  to  confirm  the  validity  of  (or  to  update)  the  maps  as 
our  understanding  increased. 

Updating  the  soil  map,  selecting  the  simulation  locations 

As  stated  before,  soil  survey  maps  provide  a  limited  depiction  of  the  spatial 
variability  of  soil  properties.  Traditionally,  soil  mapping  unit  borders  have  been  inferred 
visually  using  aerial  photos,  or  else  drawn  on  a  map  by  a  soil  scientist  familiar  with  the 
area  but  with  limited  resources  with  which  to  take  dense  samples  (J.  Mcintosh,  pers. 
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comm.).  This  can  result  in  the  omission  of  soil  mapping  units  or  in  delimitation  errors 
(Nolin  and  Lamontagne,  1991). 

One  of  the  objectives  arising  from  our  meetings  with  domain  experts  was  to  update 
the  soil  map  as  a  useful  step  before  selecting  sites  for  simulations.  This  update  was  done 
by  examining  other  sources  of  data  and  looking  for  information  inconsistent  with  the 
labeled  soil  type.  We  used  the  following  (previously  described)  sources  of  data: 

•  Elevation  (EL):  defines  landscape  position. 

•  Wetness  index  (WI):  identifies  areas  prone  to  saturation  due  to  their  high 
contributing  areas  and/or  low  slopes. 

•  Veris  Electroconductivity  (EC):  an  indicator  of  soil  erosion  and  wetness. 

•  Normalized  yield  (NY):  an  indicator  of  temporal  stability. 

•  Original  soil  survey  map  (SS)  (USDA,  1 973). 

During  this  process  the  participants  delimited  areas  on  the  maps  and  built 
hypotheses  about  the  soil  type  corresponding  to  the  observed  combination  of  EL,  WI,  EC, 
NY,  and  SS.  These  hypotheses  were  discussed  within  the  group,  and  were  later  tested  in 
the  field.  For  the  latter  we  extracted  cores  to  a  depth  of  122  cm  with  a  soil  probe  and 
discussed  the  extracted  material. 

In  Chapter  3  we  developed  automatic  spatial  sampling  methods  that  minimize  some 
proxy  for  prediction  error  (of  spatial  interpolation)  such  as  kriging  variance  (Deutsch  and 
Journel,  1 992)  or  the  Minimization  of  the  Mean  of  Shortest  Distances  (MMSD)  criterion 
(van  Groenigen  and  Stein,  1998).  We  also  showed  how  this  idea  could  be  used  to  capture 
the  spatial  variability  of  crop  yield  from  a  limited  set  of  points.  We  elaborated  further  in 
Chapter  4,  showing  how,  in  order  to  best  predict  the  spatiotemporal  variability  of  a 
spatially  autocorrelated  random  field  from  a  limited  subset  of  sample  locations,  the  best 
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spatial  sampling  scheme  included  locations  with  temporally  stable  data,  which  captured 
the  spatial  nonstationarity  of  the  field. 

These  concepts  and  outcomes  from  Chapters  3  and  4  were  used  in  the  present  study 
to  select  the  simulation  domain  for  the  IM  process.  Instead  of  using  a  grid,  we  chose  to 
simulate  locations  showing  characteristic  behaviors  throughout  the  field.  We  limited 
ourselves  to  one  location  per  original  soil  survey  mapping  unit,  plus  one  location  per  new 
(updated)  soil  map  area.  Very  small  areas  were  discarded  in  order  to  keep  the  IM  problem 
relatively  small.  We  obtained  13  locations  from  this  process;  they  will  be  described  in 
detail  in  the  Results  and  Discussion  section. 
The  IM  framework 

The  parameter  estimation  problem  is  a  nonlinear  multiobjective  optimization 
problem  that  seeks  the  parameter  set  capable  of  best  satisfying  a  set  of  criteria.  The 
multiple  criteria  are  aggregated  into  a  single  objective  function,  and  the  optimization 
process  is  structured  like  a  simulated  annealing  (Kirkpatrick,  1983)  algorithm,  having  a 
generating  mechanism,  an  acceptance  criterion,  and  a  cooling  schedule. 

In  each  algorithm  iteration  the  generating  mechanism  randomly  selects  one  of  the 
aforementioned  13  landscape  cells  and  proposes  a  perturbation  of  its  parameters.  The 
criteria  are  evaluated  for  the  new  parameter  set,  an  aggregation  operator  is  used  to 
generate  a  unique  measure  of  fitness  from  the  multiple  criteria,  and  the  difference 
between  this  fitness  value  and  the  fitness  of  the  previous  parameter  combination  is  fed  to 
an  acceptance  criterion  following  Aarts  and  Korst  (1990).  The  cooling  schedule  makes  it 
progressively  more  difficult  for  the  algorithm  to  accept  new  parameter  combinations  that 
do  not  produce  an  improvement  in  the  fitness  value. 
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Figure  7-4  shows  the  different  components  of  the  IM  framework.  The  three  boxes 
along  the  top  of  the  diagram  represent  the  available  sources  of  knowledge: 

•  Data:  history  of  crop  yield,  weather,  elevation,  soil  survey,  etc., 

•  Expert  opinions  about  the  processes  occurring  in  the  field  and  their  spatial 
variability,  as  can  be  elicited  from  farmers,  crop  consultants,  and  soil  scientists,  and 

•  Knowledge  about  the  effects  of  environmental  conditions  on  crop  growth  and 
development,  obtained  through  a  crop  model  such  as  CERES  or  CROPGRO. 

The  four  boxes  below  the  knowledge  sources  in  Figure  7-4  represent  different  four 
criteria  that  we  propose  for  evaluating  the  appropriateness  of  a  given  parameter  set: 

•  Yield  history.  This  is  the  classic  criterion  heretofore  used  in  SCM-related  inverse 
modeling:  the  RMSE  between  simulated  and  observed  yield  throughout  the  field. 

•  Parameter  sensitivity.  An  important  agricultural  concept  is  the  limiting  factor, 
based  on  von  Liebig's  Law  of  the  Minimum  (van  der  Ploeg  et  al.,  1999):  crop  yield 
is  determined  by  the  amount  of  the  essential  input  (nutrients,  water,  CO2,  light,  etc.) 
in  shortest  supply.  If  the  deficient  input  is  supplied,  yields  can  improve  to  the  point 
where  another  input  becomes  limiting,  etc.  Although  crops  do  not,  sensu  stricto, 
behave  according  to  this  law  (Sinclair  and  Park,  1993),  it  is  nonetheless  a  valuable 
approximation  for  our  purposes  because  domain  experts  can  relate  to  it.  Domain 
experts  can  frequently  identify  the  most  important  factor  (for  example,  soil  depth) 
that  limits  crop  yield  in  different  parts  of  a  field.  The  limiting  factor  frequently 
corresponds  to  a  soil  parameter  used  by  the  SCM.  When  this  happens,  small 
changes  in  this  parameter  should  affect  yield  to  a  greater  extent  (i.e.,  the  parameter 
should  be  more  sensitive)  than  small  changes  in  other  parameters. 

•  Geostatistical.  The  parameter  set  obtained  through  the  estimation  process  should 
have  a  spatial  covariance  structure  equivalent  to  that  of  the  corresponding  "real" 
soil  parameters.  Knowing  the  latter  is  not  possible  because  the  real  parameter 
values  (and  hence,  their  spatial  covariance  structure)  are  unknown,  but 
approximations  are  possible  using  an  observable  proxy  variable,  or  a  covariance 
model  may  be  taken  from  the  literature. 

•  Soil  map  unit  neighborhood.  When  an  agricultural  field  is  spatially  variable,  parts 
of  it  tend  to  behave  consistently  with  respect  to  others;  for  example,  soil  water 
content  will  tend  to  be  higher  in  depressions  than  in  ridgetops.  As  discussed  in 
Chapter  4,  this  spatial  nonstationarity  implies  the  temporal  stability  defined  by 
Vachaud  et  al.  (1985).  According  to  the  neighborhood  criterion,  given  two  sets  of 
parameters,  whichever  one  best  reproduced  an  expected  pattern  of  temporal 
stability  throughout  the  field  would  be  preferable. 
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Figure  7-4:  IM  framework  for  the  proposed  SCM  parameter  estimation  process. 


The  four  parameter-evaluation  criteria 

The  yield  history  criterion  for  a  crop  year  i,  YHCh  was  defined  as  follows: 

VUr1       1  -k-SMSYE, 

YHC^X-e         '  (7-1) 

where  A:  is  a  constant  and  SMSYE,  is  the  scaled  mean  squared  yield  error  for  year  /',  equal 
to  the  mean  squared  error  between  simulated  and  observed  yields  for  year  /'  across  the 
locations  of  interest,  scaled  by  that  year's  observed  data  variance.  In  equation  form, 


SMSYE,  = 


(7-2) 
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where  Yy  is  simulated  yield  for  year  i,  location  j,  YUj  is  the  corresponding  observed  yield, 
Oi(Yij)  is  the  observed  year  /'  yield  variance,  and  N  is  the  number  of  locations  of  interest. 

The  values  of  Equation  7- 1  vary  greatly  depending  on  the  value  of  parameter  k 
(Figure  7-5).  A  higher  k  penalizes  yield  error  more  harshly  than  lower  values.  The  value 
of  A:  could  conceivably  be  trained  from  data,  or  selected  based  on  the  user's  confidence  on 
a  particular  year's  data  quality  or  the  lack  thereof  due  to  extraneous  yield-limiting  factors 
not  considered  by  the  crop  model,  etc.  However,  we  arbitrarily  chose  a  base  value  of 
k  =  1  to  provide  sensitivity  at  high  error  levels.  Exploring  the  sensitivity  of  the  IM  results 
to  different  values  of  k  will  be  the  object  of  additional  future  study. 
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Figure  7-5:  Dependence  of  the  yield  history  criterion  for  year  i  (T#C,)  on  the  value  of 
parameter  k.  The  SMSYE,  variable  is  the  mean  squared  error,  across  all 
locations  of  interest,  between  simulated  and  observed  yield  values  for  year  i, 
scaled  by  the  yield  variance  during  that  year. 

Separate  instances  of  the  yield  history  criterion  are  created  for  each  year  of 
available  yield  data,  and  the  results  are  aggregated,  together  with  the  results  of  all  the 
other  criteria  (soil  map  unit  neighborhood,  etc.),  at  the  objective  function  level.  This  is 
somewhat  different  from  the  approach  taken  by  other  authors  such  as  Irmak  et  al.  (2001) 
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and  Paz  et  al.  (2001),  who  calculated  the  RMSE  across  both  years  and  locations.  We 
believe  this  latter  methodology  is  prone  to  biases  because  the  variance  of  a  spatial  yield 
dataset  varies  greatly  from  year  to  year,  depending  primarily  on  weather  conditions:  good 
years  with  adequate  rainfall  tend  to  have  less  variable  yield  than  very  dry  or  very  wet 
years.  Thus,  the  variability  of  dry  years  would  tend  to  dominate  a  criterion  based  on  the 
RMSE  of  several  yield  years. 

The  soil  map  neighborhood  criteria  can  be  based  on  the  spatial  relationships  of 
model  parameter  values  (such  as  soil  depth  or  the  SCS  curve  number)  or  model  outputs 
(e.g.,  soil  water).  The  criteria  are  expressed  in  terms  of  compliance  with  a  series  of 
constraints,  which  are  inequalities  having  two  attributes:  type  (greater  than,  less  than, 
equal)  and  strength  (strong,  medium,  weak).  The  constraints  are  defined  using  the 
constraint  functions  shown  in  Figure  7-6.  The  difference  between  the  values  (of  soil 
depth,  for  example)  at  two  locations  of  interest  is  plotted  on  the  x-axis,  and  the  level  of 
satisfaction  of  the  constraint  is  the  corresponding  y-axis  value  of  the  constraint  function. 


A  B  C 

Figure  7-6.  Functions  used  for  evaluating  the  neighborhood  constraints.  The  x-axis  is  the 
difference  between  the  values  of  the  model  parameter  (or  output)  of  interest  at 
two  locations  (a  and  b);  the  y-axis  provides  the  corresponding  error  function 
value.  The  three  cases  correspond  to  the  different  constraint  types:  A  should 
be  used  when  the  constraint  is  that  (va  -  vb)  should  be  greater  than  0;  B  should 
be  used  when  the  constraint  is  that  (va  -  vb)  <  0;  C  should  be  used  when  the 
constraint  is  that  (va  -  vb)  *  0. 
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The  type  of  constraint  determines  which  one  of  the  curves  to  use  (Figure  7-6A  for 
greater  than,  Figure  7-6B  for  less  than,  Figure  7-6C  for  equality),  and  the  strength  of  the 
relationship  determines  how  steep  the  slopes  are  (a  "strongly  greater  than"  would  have  a 
steeper  slope  than  a  "weakly  greater  than").  Compliance  of  the  parameters  to  these 
constraints  is  scaled  from  zero,  corresponding  to  full  compliance,  to  one,  corresponding 
to  total  non-compliance.  This  scaling  results  from  the  definition  of  the  IM  problem  as  a 
minimization  of  the  objective  function.  (The  objective  function  will  be  described  in  detail 
below).  Within  the  neighborhood  criterion,  all  of  the  individual  constraint  function  results 
(which,  as  seen  in  Figure  7-6,  vary  in  the  range  [0,1])  are  aggregated  to  obtain  a  single 
value  to  represent  the  criterion.  This  value  is  also  in  the  interval  [0,1].  We  used  the 
arithmetic  mean  as  an  aggregation  operator  in  this  case,  reflecting  the  collective 
perception  that  a  linear  combination  of  the  criteria  was  appropriate. 

The  parameters  of  these  functions  could  conceivably  be  obtained  by  training  with  a 
large  data  set,  but  the  practical  impossibility  of  obtaining  such  a  dataset  prompted  us  to 
use  nominal,  expert-opinion-derived  thresholds.  Also,  to  keep  the  constraint  networks  as 
sparse  as  possible,  we  only  expressed  relationships  between  adjacent  soil  units. 
Objective  function  (aggregation  of  criterion  results) 

The  fitness  function  is  derived  from  the  previously  discussed  criteria,  some  of 
which  may  be  instantiated  more  than  once  (soil  map  unit  neighborhood  criteria  for 
different  parameters  and  model  results,  yield  history  criteria  for  multiple  years  of  yield 
maps,  etc.).  All  of  the  criteria  values  are  scaled  to  the  range  0  (full  compliance)  to  1 
(absolute  non-compliance),  and  are  aggregated  using  an  ordered  weighted  average 
operator  or  OWA  (Yager,  1988). 
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Given  two  n-dimensional  vectors  ,4  =  [ab  a2,  ...  an]  and  B  =  [bh  b2,  ...  b„]  such  that 
B  is  obtained  by  sorting  A,  making  b,  be  equal  to  the  jth  largest  element  of  the  a„  an  OWA 
operator  (of  dimension  n)  is  a  mapping/  91"  ->  SJI  with  an  associated  n-dimensional 
vector  W=  [wh  w2,  ...  w„]  ,  such  that  three  conditions  are  met: 

1)  w,e[0,l] 

2)  2>,=1 

3)  f(al,ai,...a„)  =  YiWjbj 

j 

Using  the  B  vector  in  the  definition  of  /  (the  OWA  proper)  above  is  what  confers 
nonlinearity  to  the  OWA  operator:  a  weight  w,  is  not  associated  with  a  specific  argument 
ah  but  with  the  ith  position  within  B  (sorted  values  of  A)  instead. 

The  OWA  operator  is  a  very  flexible  aggregation  tool  that  can  implement,  through 
the  appropriate  selection  of  its  weights,  operators  including  the  median,  maximum, 
minimum,  and  a  large  class  of  means  and  other  summarizing  statistics  (Yager,  2003; 
Yager,  1993).  For  this  study  we  set  the  OWA  weights  according  to  a  form  of  Olympic 
aggregator,  analogous  to  the  scoring  methods  used  in  several  judged  events  in  the 
Olympics,  which  operate  on  ranked  scores  (Stefani,  1999).  Assuming  that  we  had  N 
criteria  as  the  a,  inputs,  we  gave  the  N-2  central  weights  of  the  W  vector  values  of 
1/(N-1),  and  gave  the  two  extreme  weights  w,  and  wN  a  smaller  weight,  l/(2N-2). 

The  purpose  of  adopting  an  Olympic  aggregator  relates  to  the  arcs  shown  linking 
the  knowledge  sources  and  the  criteria  in  Figure  7-4.  These  arcs  represent  the  level  of 
participation  of  the  knowledge  sources  in  each  criterion.  The  four  combinations  are 
different,  so  different  combinations  of  confidence  in  the  available  knowledge  would 


143 

affect  our  confidence  in  the  four  criteria  differently.  The  Olympic  aggregator  de- 
emphasizes  extreme  criteria  results,  allowing  for  varying  levels  of  confidence  in  the 
quality  of  the  knowledge  from  the  different  sources. 

There  are  alternatives  to  a  fixed  set  of  O  WA  weights  determined  a  priori;  for 
example,  obtaining  the  weights  from  data  (Beliakov,  2003;  Filev  and  Yager,  1988;  Torra, 
2000).  However,  implementing  such  a  system  would  require  a  large  set  of  input-output 
combinations  on  which  to  train  the  system;  this  would  be  impractical  for  the  current  state 
of  precision  agriculture,  in  which  the  number  of  available  years  of  yield  maps  is  still  low. 
Simulations 

The  (uncoupled)  spatial  crop  model 

We  used  CERES-Maize  (Ritchie  et  al.,  1998)  as  our  uncoupled  SCM  in  the  IM 
process,  as  in  the  second  case  study  of  Chapter  5.  Also  as  in  that  case,  we  assumed  one 
environment  (equivalent  to  a  common  parameter  space  for  all  the  locations  of  interest), 
and  made  the  parameter  ranges  sufficiently  broad  to  accommodate  the  parameter  value 
combinations  of  all  the  locations  we  chose  to  simulate.  The  major  difference  with  respect 
to  the  Chapter  5  exercise  was  that  the  parameters  to  fit  in  this  chapter  resulted  from  the 
discussion  process:  we  built  a  qualitative  model  of  parameter  sensitivity  based  on  the 
limiting-factor  idea,  and  used  it  to  choose  sensitive  parameters. 

We  ran  the  IM  process  for  different  combinations  (scenarios)  of  the  available  yield 
data  years  (1997,  1999,  and  2001).  The  planting  date  reported  by  the  farmer  was  April  15 
(day  of  the  year,  DOY,  105)  for  all  three  years.  Row  spacing  was  76  cm;  the  farmer's 
targeted  planting  density  was  6.25  plants/m2.  These  simulation  scenarios,  including  the 
parameters  that  were  estimated,  and  any  additional  criteria  or  sources  of  knowledge 
involved,  will  be  described  in  greater  detail  later. 
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The  spatially-coupled  crop  model 

We  modified  the  inverse-modeling  scheme  used  with  the  spatially-coupled  model 
of  Chapter  2  to  work  with  CERES-Maize  and  to  fit  the  parameters  selected  through  the 
discussion  process.  We  used  it  to  estimate  parameters  for  different  combinations  of  the 
available  years  of  yield  data  (1997,  1999,  and  2001)  in  the  Suggs  4  field,  under  the  same 
management  conditions  shown  above. 
Weather  data  needed  for  crop  simulations 

Mean  annual  rainfall  in  Murray  between  1970  and  1999  was  1407  mm,  distributed 
fairly  evenly  throughout  the  year  but  decreasing  somewhat  in  the  late  summer  and  early 
fall  (Figure  7-7).  There  is  significant  interannual  variability  of  monthly  precipitation 
(Figure  7-7),  which  frequently  results  in  late-season  droughts  for  maize  and  soybean 
crops. 
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Figure  7-7:  Monthly  rainfall  in  Murray  during  1970-99.  The  central  point  of  each  box 
plot  shows  the  corresponding  month's  median  monthly  precipitation  over  the 
30-year  period.  Boxes  represent  inter-quartile  ranges,  and  the  whiskers  are  the 
non-outlier  maximum  and  minimum,  defined  as  the  mean  ±  2  standard 
deviations.  (Medians  are  linked  by  lines  to  emphasize  seasonal  trends,  and  not 
to  imply  interpolation.) 
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We  used  12  years  (1990  to  2001)  of  daily  weather  data  for  crop  simulations.  Some 
of  these  weather  years  (1997,  1999,  2001)  were  used  for  the  IM  process  and  its 
evaluation,  and  the  rest  were  used  to  obtain  genetic  coefficients  as  described  further 
below.  The  year  2001  was  also  used  for  evaluation  with  the  spatially-coupled  model. 

We  built  a  dataset  comprising  the  four  weather  variables  from  the  DSSAT 
minimum  data  set  (Hunt  and  Boote,  1998):  maximum  and  minimum  temperature,  total 
solar  radiation,  and  precipitation.  The  temperature  and  rainfall  data  from  mid- 1999  to  the 
end  of  2001  were  collected  on-site  with  a  Davis  Instruments  Wizard  III  automatic 
weather  station  installed  near  the  center  of  the  field.  The  remaining  temperature  and 
rainfall  data  were  obtained  from  the  Midwest  Regional  Climate  Center  (MRCC),  and 
correspond  to  the  Murray  station  (36°  35'N,  88°  18'W).  All  the  solar  radiation  data  were 
also  obtained  via  the  MRCC  and  correspond  to  the  nearest  airport  with  a  solar  radiation 
record:  Paducah,  KY  (37°  04'N,  88°  46'W). 

We  used  the  WeatherMan  program  (Pickering  et  al.,  1994)  to  convert  the  weather 
data  into  the  DSSAT  format  and  to  estimate  values  to  fill  data  gaps.  We  also  checked 
radiation  data  quality  using  the  envelope  approach  as  proposed  by  Allen  (1996). 
Initial  conditions 

In  order  to  simplify  implementation  of  both  the  uncoupled  and  spatially-coupled 
SCMs,  we  assumed  that  simulations  began  in  all  cells  with  the  same  initial  conditions: 
drained  upper  limit  (DUL)  for  the  first  30  cm  of  the  soil  profile,  and  saturation  below 
that.  This  is  consistent  with  an  "average"  behavior  of  the  region  as  described  by  the  soil 
survey  (USDA,  1973). 
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Genetic  coefficients 

For  the  years  1999  and  2001  we  used  the  Pioneer  33A14  hybrid.  For  1997  we  used 
the  Pioneer  3281 W  hybrid.  We  obtained  DSSAT  genetic  coefficients  to  characterize 
growth  and  development  for  these  hybrids  by  adapting  an  existing  medium  season  hybrid 
parameter  set  in  the  DSSAT  3.5  MZCER980.CUL  data  file.  We  used  12  years  of  weather 
data  (1990-2001)  and  used  the  CERES  model  in  potential  yield  mode. 

We  first  adjusted  33A14,  and  then  parameterized  3281 W  by  modifying  the  33A14 
parameters.  We  calibrated  the  parameters  sequentially;  for  33A14,  we  first  adjusted  the 
PI  parameter  from  200  to  260  °C  day"1  and  P5  from  800  to  960  °C  day"1  so  that  the  12- 
year  mean  time  to  flowering  and  maturity  occurred  65  days  after  crop  emergence  and  55 
days  after  flowering  respectively,  as  specified  by  Bitzer  et  al.  (2003).  Second,  we 
changed  the  fruit  growth  rate  (G2)  from  8.5  to  9.2  (mg  day"1).  Andrade  et  al.  (1996) 
reported  a  maximum  fruit  growth  rate  of  9.5  mg  day"1  for  a  similar  hybrid  grown  under 
optimal  temperature.  Modifications  introduced  to  G2  increased  the  simulated  seed  weight 
from  180  to  320  mg  /  seed,  which  adequately  reproduced  observed  values  (350  mg  /  seed) 
for  a  hybrid  similar  to  Pioneer  33A14  (Andrade  et  al.,  1996).  We  also  set  G3  to  a 
maximum  of  700  kernels/plant.  These  modifications  allowed  us  to  reproduce  the  yield 
range  and  maximum  yields  attained  in  Murray,  KY  (Pearce  and  Poneleit,  1997)  with  the 
33A14  hybrid. 

The  parameter  values  corresponding  to  3281 W  were  PI  =  280  °C  day"1,  P5  =  980 
°C  day"1,  G2  =  6.5  mg  day"1,  and  G3  =  600.  We  set  the  photoperiod  sensitivity  parameter 
at  0  for  both  hybrids,  since  we  used  a  fixed  planting  date. 
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Analyses 

We  defined  scenarios  as  different  runs  of  the  IM  process,  performed  using  different 
combinations  of  criteria.  We  defined  two  yield  history  criteria  scenarios  using  the  IM 
framework  and  the  uncoupled  model.  We  also  defined  two  yield-based  scenarios  for 
evaluating  the  coupled  SCM  (developed  in  Chapter  2),  and  for  comparing  its  results  with 
those  of  the  uncoupled  model.  We  compared  the  results  using  RMSE  and  parameter 
values.  (Note  that  the  RMSE  was  used  for  comparisons — in  both  cases  we  used  an  OWA 
and  the  error  function  shown  in  equation  7-1.)  We  also  discussed  differences  in  simulated 
and  observed  soil  water  content  for  2001  for  the  3-year  scenarios. 

We  also  evaluated  the  IM  framework  with  one  soil  map  unit  neighborhood 
criterion.  The  five  scenarios  are  shown  in  Table  7-1.  We  did  not  include  any  instances  of 
the  sensitivity  or  geostatistical  criteria  shown  in  Figure  7-4. 

Table  7-1 .  Different  IM  scenarios,  showing  number  of  instances  of  each  criterion.  

Name    Model         Yield  history  Neighborhood  Description 

 criterion  instances  crit.  instances  

1A       Uncoupled  2  (1997,99)          0  Standard,  yield-only  IM 

IB       Uncoupled  3  (1997,99,2001)  0  Standard,  yield-only  IM 

1C       Uncoupled   0                        1                   Expert-only  IM  framework 
2A       Coupled      2(1997,99)          0  Standard,  yield-only  IM 

2B        Coupled      3  (1997,99,2001)  0  Standard,  yield-only  IM 

The  selection  of  parameters  to  estimate  was  a  result  of  the  discussion  process.  We 
selected  a  subset  among  soil  depth,  fraction  of  available  water  (see  Chapter  2),  SCS  curve 
number,  saturated  hydraulic  conductivity,  KSAT,  and  plant  density.  After  estimating  the 
selected  parameters,  we  plotted  their  values  for  the  different  simulation  locations,  and 
noted  their  similarities  and  differences  with  respect  to  soil-probe  field  observations, 
expert  opinion,  and  values  taken  from  the  soil  survey  and  published  lab  tests. 
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Results  and  Discussion 

Elevation,  Wetness  Index 

As  mentioned  previously,  elevation  was  sampled  7030  times  over  the  Suggs  4  field, 
which  has  an  area  of  about  17.6  ha.  The  resulting  high  sampling  density,  approx.  400 
samples/ha,  made  it  possible  to  use  narrow  (10-meter)  lag  classes  to  estimate  the 
elevation  (isotropic)  semivariogram,  while  keeping  a  large  number  of  pairs  of  points  per 
lag  class.  Indeed,  the  least-populated  lag  class  had  9337  pairs  of  points.  The  best  fit  was 
obtained  with  a  Gaussian  model  (Figure  7-8)  with  parameter  values  of  Co  =  0.03  m 
(nugget);  C0+Ci  =  4.069  m2  (sill);  a  =  428.16  m  (effective  range).  The  model  fit  the  data 
well  (R2  =  0.995). 
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Figure  7-8.  Semivariogram  estimated  from  elevation  data.  The  dotted  line  shows  the  data 
variance. 


The  elevation  semivariogram  model  had  a  very  low  nugget  effect 
(C)/(Co+C 0=0.993)),  but  it  did  not  have  a  clear  sill.  The  low  nugget  effect  suggests  a 
smooth,  noise-free  RTK  GPS-derived  elevation  surface,  and  translates  into  high 
confidence  in  the  subsequent  resampling  (via  ordinary  kriging)  of  the  elevation  data  into 
a  regular  grid.  The  lack  of  a  sill  is  due  to  nonstationarity,  which  was  expected  given  that 
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the  field  slopes  downward  toward  the  southeast.  The  nonstationarity  is  not  a  problem, 
however,  given  the  high  spatial  density  of  the  dataset.  Kriging  algorithms  used  for  spatial 
interpolation  usually  restrict  the  number  of  points  that  contribute  to  a  point  estimate.  We 
used  the  default  ordinary  kriging  algorithm  implemented  in  Surfer  7.0  (Golden  Software, 
1999),  which  uses  the  24  nearest,  albeit  homogeneously  distributed,  points  to  the  location 
of  interest.  For  most  of  the  elevation  map,  this  condition  was  met  with  points  only  a  few 
meters  away  from  the  candidate  location.  At  such  short  distances,  the  field's 
nonstationarity  (at  a  scale  of  hundreds  of  meters)  is  irrelevant. 

The  resampled  elevation  data  are  shown,  in  wireframe  form,  in  Figure  7-9.  Note  the 
different  landscape  positions  in  the  field,  and  their  corresponding  soil  types  (Figure  7-2): 
the  ridge,  containing  Loring  soils;  the  slope  with  Grenada  soils,  and  the  lower  landscape 
positions  (surrounding  the  waterways)  with  Calloway  soils. 


Figure  7-9:  Wire  frame  elevation  map  of  the  Suggs  4  field,  clipped  to  the  field  boundary. 

The  x-axis  and  y-axis  show  UTM  coordinates,  and  the  z-axis  shows  elevation 
above  an  arbitrary  reference.  (All  expressed  in  meters). 


150 

The  wetness  index  map  (Figure  7- 1 0)  shows  additional  information  about  the  field, 
including  artificial  drainage  ways,  shown  as  depressions  in  Figure  7-9  and  as  high-WI 
lines  running  north-south  in  Figure  7-10.  When  this  map  was  discussed,  the  farmer 
recalled  that  his  predecessor  had  made  them  by  moving  some  earth  with  a  bulldozer. 
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Figure  7-10.  Wetness  index  (Beven  and  Kirkby,  1979)  calculated  for  the  Suggs  4  field. 
The  13  IM  framework  simulation  domain  locations  are  shown  as  crosses. 

Electroconductivity  (EC) 

The  electroconductivity  dataset  also  had  a  high  spatial  sampling  density;  we  used 
5-meter  lag  classes  to  estimate  its  isotropic  semivariogram.  The  least-populated  lag  class 
was  again  the  first,  with  10,612  pairs  of  points  for  surface  EC  and  9,876  for  deep  EC.  The 
best  semivariogram  model  fit  for  both  EC  layers  was  obtained  with  an  exponential  model 
(Figure  7-11,  A  and  B)  and  parameter  values  of  C0=  1.5  mS2/m2  (nugget);  C0+Ci  =  13.53 
mSV  (sill);  a  =  134.7  m  (effect,  range)  for  the  surface  EC  data,  and  C0=  2.37  mS2/m2; 
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2  2 

Co+Q  =  17.98  mS  /m  ;  a  =  84.3  m  for  the  deep  EC  data.  In  both  cases,  the  model  fit  the 
data  well  (R2  =  0.995  and  R2  =  0.968  for  surface  and  deep  EC  data,  respectively). 
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Figure  7-11.  Semivariogram  estimated  from  surface  (A)  and  deep  (B)  electroconductivity 
data.  The  dotted  line  shows  the  data  variance. 


The  surface  electroconductivity  maps  (Figure  7-12  A)  show  spatial  variability  of 
EC  in  the  Suggs  4  field.  Electroconductivity  has  been  shown  to  correlate  with  several  soil 
properties,  including  clay  content,  organic  matter  content,  water  content,  salinity  level, 
etc.  (Johnson  et  al.,  2001).  In  the  case  of  Suggs  4,  the  opinion  of  our  domain  experts  was 
that  the  variability  of  EC  would  indicate  differences  in  clay  content  (due  to  erosion  or  soil 
type),  or  in  soil  moisture  (primarily  due  to  landscape-position-mediated  spatial  water 
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movement).  We  studied  the  EC  maps  together  with  other  data  (elevation,  WI,  soil  type, 
normalized  yield)  to  allow  us  to  distinguish  between  clay-  and  moisture-specific  effects. 
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Figure  7-12.  Veris  electroconductivity  maps  of  the  field.  Surface  EC  refers  to  a  set  of 
coulters  that  represents  near-surface  conditions,  whereas  Deep  EC  integrates 
conditions  down  to  approximately  1.5  m.  The  13  IM  framework  simulation 
domain  locations  are  shown  as  crosses. 


The  EC  maps  (Figure  7- 1 2)  show  erosion  on  the  ridge  and  some  high-EC  areas 
further  east  that  could  be  either  wet  or  high-clay.  In  the  case  of  the  areas  around  the 
CaA(N)  point,  high  EC  is  probably  a  combined  effect  of  a  higher  clay  content  near  the 
surface  associated  with  the  making  of  the  waterway  with  a  higher  water  content  in  the 
waterway  proper.  Northeast  of  the  GrB  point,  high  EC  is  probably  due  to  erosion,  given 
the  presence  of  a  small  (low-WI)  hilltop  there.  In  the  region  located  southeast  of  the  Hn 
point,  high  EC  may  be  associated  with  wetness  due  to  extremely  poor  drainage, 
combined  with  the  presence  of  some  clay  particles  brought  down  by  erosion  from  higher 
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landscape  positions.  Finally,  the  elevated  EC  of  the  region  north  of  the  GrA  point  is 
apparently  associated  with  mechanical  soil  disturbances  during  the  making  of  an  old  road 
that  traversed  the  field  in  an  east-west  direction. 
Soils 

The  ridge  on  the  western  edge  of  Suggs  4  field  (Figure  7-9)  is  covered  with  a 
Loring  B  soil,  which  is  characterized  by  erosion  and  a  fragipan.  As  discussed  above,  the 
erosion  is  consistent  with  the  high  EC  shown  in  Figure  7-12.  The  restrictive  fragipan  does 
not  translate  into  flooding  because,  due  to  its  elevated  landscape  position  and  relatively 
high  gradient,  the  area  mostly  has  a  low  wetness  index  (Figure  7-10);  it  does  not  receive 
water  from  other  landscape  positions,  and  it  should  be  able  to  drain  laterally  if  necessary. 
The  fragipan  does  impose  another  problem,  however,  which  is  aggravated  by  erosion 
effects:  lack  of  water. 

Next  below  the  Loring  soil  is  the  Grenada  soil,  which  also  has  a  fragipan,  although 
perhaps  not  as  well  developed  in  some  areas  as  the  one  underlying  the  Loring  soils  (J. 
Mcintosh,  pers.  comm.)  This  region  is  less  eroded  than  the  Loring  soils,  and  has  received 
silt  from  them;  thus,  EC  is  lower  (Figure  7-12)  and  the  distance  to  the  fragipan  should  be 
higher.  The  WI  is  higher  than  that  of  the  Loring  soils  (Figure  7-10)  due  to  lower  slopes 
and  greater  contributing  areas.  This  is  expected  to  result  in  more  availability  of  water,  and 
thus  in  greater  yields. 

Below  the  Granada  soils  lay  the  Calloway  soils.  These  also  have  a  fragipan,  more 
clay  than  the  Grenada  soil,  and  are  wetter.  Maize  crops  typically  have  germination  and 
emergence  problems  in  this  region. 
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Yield  Maps 

The  spatial  sampling  density  of  the  yield  data  was  higher  than  that  of  elevation  and 
EC.  This  allowed  us  to  use  narrower  (3-  or  5-meter)  lag  classes  to  estimate  the  isotropic 
yield  semivariograms.  In  all  cases,  the  best  fit  was  obtained  with  exponential  models. 

Figure  7-13  shows  the  semivariograms  for  all  the  years  considered.  Table  7-2 
shows  the  parameters,  a  measure  of  goodness  of  fit,  and  the  lag  class  width  used  in  each 
case.  Nugget  values  were  generally  quite  high  (Ci/(Co  +  Ci)  <  0.873),  suggesting  the 
presence  of  high-frequency  noise  due  to  lags  and  the  complex  dynamics  of  crop 
redistribution  within  the  harvester  and  its  temporal  interaction  with  the  grain  flow  sensor, 
as  discussed  by  Birrell  et  al.  (1996)  and  Pierce  et  al.  (1997).  This  high-frequency  noise 
motivated  us  to  increase  the  number  of  points  contributing  to  the  interpolated  yield  maps 
to  32  from  the  default  (24)  used  in  Surfer. 

Table  7-2.  Semivariogram  parameters  for  the  yield  map  data.  


Year 

Co  (kg/ha)' 

Co +  C,  (kg/ha)2 

a(m) 

r1 

Lag  class  width  (m) 

1997  Maize 

710,000 

4,840,000 

66.6 

0.999 

3 

1998  Soybeans 

61,400 

202,000 

37.2 

0.974 

5 

1999  Maize 

613,000 

2,592,000 

78.0 

0.996 

5 

2001  Maize 

383,000 

3,022,000 

58.2 

0.993 

5 

Figures  7-14,  7-15,  and  7-16  show  the  resampled  yield  maps  used  for  building  the 
normalized  yield  (NY)  map.  Despite  the  interannual  spatial  yield  variability  shown,  some 
behaviors  were  conserved  between  years,  such  as  the  crescent-shaped  high-yield  zone 
stretching  southward  from  the  western  end  of  the  northernmost  waterway  almost  to  the 
southern  field  boundary,  the  high-yielding  Loring  and  Grenada  soils  in  the  northwest 
sector  of  the  field,  and  the  low-yield  spots  around  the  central  waterway.  These  features 
are  shown  more  clearly  in  the  normalized  yield  map  (Figure  7-17). 
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jure  7-13.  Semivariograms  fitted  to  the  observed  yield  data. 
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Figure  7-14.  Resampled  1997  maize  yield  data. 
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Figure  7-15.  Resampled  1998  Soybean  yield  data. 
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Figure  7-16.  Resampled  1999  maize  data. 
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Figure  7-17.  Normalized  three-year  (1997,  1998,  1999)  yield  (NY)  map.  The  marked 
points  show  the  13  locations  selected  for  simulation. 
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Updating  the  Soil  Map,  Selecting  the  Simulation  Locations 

Several  elements  of  the  normalized  yield  map  (Figure  7-17)  can  be  predicted  by 
inspecting  the  soil  map  (Figure  7-2)  and  elevation  map  (Figure  7-9):  the  lower  yields  of 
the  eroded  LoC2  soils  along  the  field  boundary  in  the  northwest  corner  of  the  field; 
relatively  high  yields  in  the  relatively  deep,  relatively  well-drained  Loring  (LoB)  and 
(especially)  Grenada  (GrB)  soils;  lower  yields  in  the  shallower,  drainage-impaired 
Calloway  soils,  etc. 


9)  Mysteriously 
low-yielding  area 


7)  LoA?  GrA? 
Check  slope 
(should  be  lower) 


3)  Why  does  this 
have  higher  EC? 

If  it's  a  hump, 
perhaps  erosion. 


1)  This  follows  changes 
in  EC  and,  to  a  lesser 
extent,  yield.  Maybe  this  is 
a  soil  with  a  pan  closer  to 
the  surface,  such  as  PuB 
(Purchase  B). 


6)  Ca  unless  hump; 
otherwise  eroded  Gr? 


2)  LoA?  GrA? 
Check  slope 
(should  be  lower) 


Figure  7-18.  Summary  of  anomalies  and  candidate  zones  for  additional  soil  map  units 


identified  during  discussion  sessions  with  the  domain  experts. 
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Other  temporally-stable  behaviors  were  unexpected,  however.  For  example,  the  low 
yielding  zones  in  the  north  center  and  in  the  southwest  corner  of  the  field,  the  high 
yielding  spots  in  the  center  of  the  field,  etc.  We  collected  and  mapped  the  set  of 
unexpected  behaviors  (Figure  7-18),  and  generated  the  following  hypotheses  (numbered 
as  in  Figure  7-18): 

1 )  Zone  1  of  Figure  7-18  (erosion-prone,  low-WI  landscape  position;  high  EC;  low 
NY)  corresponds  to  a  very  eroded  Loring-like  soil  (Purchase  series)  with  a  ' 
fragipan  very  close  to  the  surface.  Purchase  soils  (coarse-silty,  mixed,  thermic 
Ochreptic  Fragiudalfs)  have  been  previously  reported  in  a  neighboring  field 
studied  with  a  first-order  soil  survey  (Mueller  et  al.,  2003). 

2)  The  characteristics  of  Zone  2  (medium  WI;  low  EC,  high  NY  with  respect  to 
other  regions  in  similar  landscape  positions),  the  previously  mentioned  crescent- 
shaped  region,  suggest  that  water  does  not  limit  growth  here  as  much  as 
elsewhere.  We  hypothesized  this  is  a  Grenada  soil,  either  exceptionally  deep,  or 
having  a  poorly  developed  fragipan. 

3)  Zone  3  (low  WI,  high  EC,  high  NY)  is  an  eroded  Grenada  soil  on  an  elevated 
landscape  position. 

4)  Zone  4  (very  high  WI,  high  EC,  low  NY)  is  a  very  wet  Henry  soil.  Henry  soils 
(coarse-silty,  mixed,  active,  thermic  Typic  Fragiacualfs)  are  common  in  low 
landscape  positions  throughout  the  region  (SCS,  1973),  and  have  been  previously 
reported  in  a  neighboring  field  (Mueller  et  al.,  2003). 

5)  Zone  5  (high  WI,  low  surface  EC  indicating  low  erosion,  high  deep  EC  indicating 
water  or  clay,  low  NY)  is  a  Calloway  soil  with  drainage  problems. 

6)  Zone  6  (low  WI,  high  EC,  medium  NY)  is  an  eroded  Grenada  soil. 

7)  Zone  7  (low  WI,  low  EC,  high  NY)  is  a  deep  Loring  or  Grenada  soil,  similar  to 
that  of  Zone  2. 

8)  Zone  8  (low  WI,  low  EC,  high  NY)  is  a  deep  Grenada  soil,  as  in  Zone  2. 

9)  Zone  9  (low/medium  WI,  low  surface  EC,  high  deep  EC,  very  low  NY)  was  taken 
out  of  consideration  because  the  farmer  remarked  that  it  was  very  wet  (and 
usually  had  very  poor  stand  quality)  because  in  the  past  a  road-building  crew  had 
moved  earth  in  a  way  that  prevented  water  running  off  the  field  in  that  sector. 
Water  ponds  as  a  consequence. 
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Table  7-3.  Soil  probe  observations  corresponding  to  the  anomalies  in  Figure  7-18. 
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1  All   PuB2      Purchase  soil.  There  are  7.5  cm  of  topsoil,  7.5cm  of  transition.  Fragic 

properties  begin  at  30  cm.  Roots  should  not  go  much  further  than  45  cm. 
Very  dry! 

2  402    GrB(Ba)  23  cm  of  topsoil,  the  first  10  cm  very  clean;  10-23  cm,  very  mottled;  23-56, 

dominantly  brown,  highly-mottled  gray  subsoil;  56-81  cm,  mottled  gray 
and  brown  (wetter);  81-122  cm,  Dominantly  grey.  No  restrictions  to  122cm. 
This  is  a  Kurk  soil,  i.e.  a  Calloway  without  restrictions. 

2     433    GrB(Ba)  Has  mottling.  Brown  subsoil  below  23  cm  with  redox  features.  At  61  cm 
the  (predominantly  brown)  subsoil  becomes  very  mottled,  still  good  soil, 
with  no  restrictions.  At  97  cm,  hit  gray  &  clay.  No  fragipan  until  the  bottom 
(122+  cm),  but  clay  may  be  restrictive.  Behaves  like  a  Grenada. 

Very  thick  topsoil,  23  cm.  Subsoil  gets  darker.  Virtually  no  root  restrictions 
to  76  cm.  Some  redox  at  66  cm.  Minor  restriction  at  107  cm.  Grey  begins  at 
89  cm.  Water  perches  there.  Very  mild  fragic  properties  at  122  cm.  The  best 
soil  we've  seen.  It  looks  depositional.  It's  more  like  a  Loring  than  a 
Grenada,  but  6  m  SE  of  here  it's  more  like  a  Grenada. 

Wet!  First  few  cm  has  a  lot  of  OM.  Total  topsoil  about  10-12  cm.  Below 
that,  grey  subsoil.  Deeper  grey  (more  clay)  at  56  cm.  Saturated  at  81  cm, 
still  a  grey,  heavy  clay  mess.  Approximately  30%  clay,  65-68%  silt.  Fragic 
properties  at  1 12  cm.  Henry-like,  but  the  pan  is  a  little  deeper.  Behaves  like 
Routon.  Call  it  Henry. 

5     405    CaA(W)  Topsoil  18  cm.  Some  mottling,  wetness.  Below  it  there's  grey  (water).  Grey 
but  not  too  clayey  down  to  51  cm.  Drying  out  from  66  cm.  At  91  cm  we 
start  getting  more  fragic  properties.  It's  not  a  strong  pan  but  it's  got  a  fragic 
character.  It's  a  Calloway. 

This  looks  like  a  disturbed  Grenada  soil.  It's  saturated  from  38  cm  down. 
The  material  feels  alluvial.  No  mechanical  restrictions  until  91  cm,  when  it 
gets  harder.  There  seems  to  be  a  pan  at  107  cm. 

20  cm  of  beautiful  topsoil;  hit  pan  at  51  cm;  12-38  cm,  beautiful  subsoil; 
38-51  cm,  mottled  subsoil.  Pan  at  51  cm.  This  soil  is  an  eroded  Loring, 
based  more  on  depth  to  the  pan  than  on  topsoil  thickness. 

Compacted  zone  at  13-20  cm.  Grenada-like.  There  are  lots  of  weeds  around 
this  point.  Clay  (grey)  starts  at  81  cm.  At  91  cm  the  probe  goes  no  further. 

Mottling  at  13  cm,  grey  at  25.  "Going  through  butter"  for  the  next  38,  but 
very  grey,  very  wet.  No  restrictions.  Significant  pick  up  of  clay  at  76  cm. 
Depth  122+  cm.  The  "well-drained"  pan  soils  will  be  better  for  row  crops 
than  these  worse-drained  soils  with  no  mechanical  restrictions.  Calloway. 


6     A08  GrA 


7     401  N/A 


8  404 

9  419 


GrB(N) 
N/A 
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Soil  Probe  Observations 

We  made  a  series  of  soil  probe  observations  (Table  7-3)  to  address  our  previously 
mentioned  soil-map-anomaly  hypotheses.  A  soil  scientist  (Jerry  Mcintosh,  NRCS),  a  crop 
consultant  (John  Potts,  AgConnections,  Inc.),  and  myself  traversed  the  field,  making  and 
discussing  soil  probe  observations  (Figure  7-19). 


Figure  7-19.  Field  observations  with  a  soil  probe.  John  Potts  (left)  and  Jerry  Mcintosh 
(right)  examining  a  soil  core.  (Photo  courtesy  of  Rick  Murdock.) 

Most  of  our  observations  (Purchase  soil  of  Zone  1 ,  Henry  soil  of  Zone  4,  Calloway 
soil  of  Zone  5,  Grenada  soil  of  Zone  8,  Calloway  soil  of  Zone  9)  supported  their 
corresponding  hypotheses.  The  apparently  disturbed  soil  at  Zone  6  was  initially 
surprising,  but  the  farmer  recalled  that  in  the  past  a  road  crossed  the  field  east-west, 
running  near  the  Zone  6  observation  point.  He  suggested  that  material  removed  from  the 
roadbed  could  have  been  deposited  in  Zone  6. 
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Our  two  observations  in  Zone  2  revealed  somewhat  different  soils,  but  both  shared 
the  lack  of  a  strong  limiting  horizon.  For  simplicity  we  decided  to  consider  them  as  part 
of  the  same  soil  type,  which  we  called  GrB(Ba). 

The  observation  in  Zone  3  revealed  a  soil  more  appropriately  described  as  Loring 
than  Grenada.  However,  a  quick  probe  a  few  meters  away  from  the  sampling  location 
showed  a  Grenada  soil.  We  decided  to  label  the  zone  as  Grenada  B  (GrB). 

The  results  from  Zone  7  were  somewhat  contradictory.  The  topsoil  and  subsoil  did 
not  seem  eroded.  This  is  consistent  with  a  Grenada  soil.  However,  the  depth  to  the  pan 
was  low  (51  cm),  which  is  inconsistent  with  the  zone's  high  yield.  Other  observations 
conducted  nearby  showed  fragipan  depths  of  approximately  a  meter.  This  discrepancy 
suggests  that  this  zone  may  have  spatially-variable  fragipan  development. 

The  Simulation  Domain 

As  explained  before,  we  wished  to  simulate  a  reduced  but  representative  set  of 

locations  in  the  field,  attempting  to  represent  a  distinct  soil  type  with  each  location.  We 
simplified  the  candidate  regions,  consisting  of  the  union  of  the  original  soil  types  of 
Figure  7-2  and  the  additional  zones  shown  in  Figure  7-18,  to  the  13  soil  mapping  units 
shown  in  Figure  7-20.  For  this  we  discarded  Zone  7  (as  probably  very  similar  to  Zone  2) 
and  Zone  9  (considering  it  impractical  to  simulate  due  to  its  runoff  problems),  and  chose 
one  point  per  resulting  soil  type,  located  in  or  near  a  temporally  stable  NY  zone  (Figure 
7-16).  This  latter  criterion  follows  from  the  conclusions  of  Chapter  4  regarding  the 
convenience  of  sampling  at  temporally-stable  locations  as  a  way  to  maximize  the 
predictive  power  of  a  spatiotemporal  simulation  model. 
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Figure  7-20.  Set  of  13  soil  types  used  as  the  IM  framework  simulation  domain.  Free  ovals 
label  the  soil  map  units  containing  them.  Ovals  contained  within  larger  ovals 
denote  new  soil  units  arising  from  our  anomaly-driven  hypotheses. 

Knowledge  Elicitation:  Populating  the  Neighborhood  Criteria 

Figure  7-21  shows  the  soil  depth  criterion  data,  elicited  from  interaction  with  the 
experts.  In  the  discussion  sessions  related  to  this  criterion  we  made  inferences  about  the 
soil  depth  in  successive  pairs  of  adjacent  soil  mapping  units  based  on  all  the  available 
information,  reached  a  consensus  regarding  our  level  of  confidence  in  the  inferred  spatial 
relationship,  and  represented  it  as  an  arc  on  the  background  bitmap  of  the  field  using 
Microsoft  Visio  Professional  2002  (Microsoft  Corp.,  2001).  We  revisited  and 
successively  refined  the  network  over  several  meetings. 
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An  example  of  the  inference  process  follows:  EC,  landscape  position  (LP),  and  our 
soil  probe  data  suggested  that  the  LoB  soil  is  deeper  than  the  PuB  soil,  because  the  latter 
has  higher  EC,  a  more  erosion-prone  (high-gradient)  LP,  a  very  shallow  depth  to  the 
fragipan,  and  a  very  eroded  topsoil  (Table  7-3).  Similarly,  EC,  NY,  and  LP  suggested  that 
the  LoB  is  deeper  than  the  LoC2  soil;  in  LoC2,  EC  is  higher,  the  slope  is  greater,  and  the 
NY  is  lower  than  in  LoB.  However,  our  confidence  in  this  relationship  was  lower  than  in 
the  previous  case. 

The  soil  depth  neighborhood  criterion  (Figure  7-21)  describes  soil  depth,  a  crop 
model  parameter.  The  same  kind  of  criterion  can  be  used  with  a  model  output  such  as 
wetness.  Figure  7-22  shows  the  wetness  neighborhood  criterion,  a  network  of  expected 
inequalities  of  plant-extractable  soil  water  in  the  first  45  cm  of  soil.  This  criterion  was 
populated  using  the  inferential  methods  described  above.  Finally,  Figure  7-23  shows  a 
network  of  relationships  of  an  input  variable,  plant  density,  that  is  environmentally 
dependent  (i.e.,  can  vary  from  year  to  year)  but  cannot  be  predicted  by  CERES. 

We  chose  to  use  two  of  the  criteria  shown  above,  soil  depth  and  wetness,  in  the  IM 
framework  implementation  for  this  study.  We  set  the  constraint  function  (Figure  7-6) 
parameters  of  both  criteria  to  the  arbitrary  values  shown  in  Table  7-4. 

Table  7-4.  Values  adopted  for  the  neighborhood  criterion's  constraint  thresholds;  s  is  a 


very  small  constant  used  for  avoiding  numerical  division-by-zero  errors. 


Constraint 

x2 

*3 

*4 

x5 

x6 

Strongly  greater  than 

-100 

0 

E 

100 

Greater  than 

-100 

-0.05 

0.05+8 

100 

Weakly  greater  than 

-100 

-0.1 

0.1+6 

100 

Strongly  less  than 

-100 

-8 

0 

100 

Less  than 

-100 

-0.05-e 

0.05 

100 

Weakly  less  than 

-100 

-0.1-6 

0.1 

100 

Strongly  similar  to 

-100 

-0.025-e 

-0.0083 

0.0083 

0.025+e 

100 

Similar  to 

-100 

-0.05-6 

-0.017 

0.017 

0.05+8 

100 

Weakly  similar  to 

-100 

-0.1-6 

-0.033 

0.033 

0.1+8 

100 
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Figure  7-21:  Soil  depth  neighborhood  criterion.  The  figure  shows  the  Suggs  4  field,  the 
soil  mapping  units  it  is  divided  into,  and  the  soil  depth  relationships  elicited 
from  the  experts.  Free  ovals  label  the  soil  map  units  that  contain  them.  Ovals 
nested  within  larger  ovals  denote  soil  units  not  shown  in  the  soil  survey  but 
identified  from  anomalies  in  EL,  WI,  EC,  and  NY  maps  and  fieldwork  with  a 
soil  probe.  The  ovals'  labels  correspond  to  the  NRCS  nomenclature  for  soil 
series.  The  part  of  the  labels  in  parentheses  differentiates  between  different 
map  units  containing  the  same  soil  type.  With  respect  to  the  arc  labels,  GT  is 
"greater  than".  For  example,  the  Loring  B  (LoB)  soil  is  expected  to  be  deeper 
than  the  highly  eroded  Purchase  B2  (PuB2)  soil;  "Sim"  means  "similar  to". 
The  dotted  lines  represent  a  weaker  relationship  than  the  solid  lines. 
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Figure  7-22:  Wetness  neighborhood  criterion.  The  figure  shows  the  Suggs  4  field,  the  soil 
mapping  units  it  is  divided  into,  and  the  wetness  relationships  elicited  from 
the  experts.  Free  ovals  label  the  soil  map  units  that  contain  them.  Ovals  nested 
within  larger  ovals  denote  soil  units  not  shown  in  the  soil  survey  but  identified 
from  anomalies  in  EL,  WI,  EC,  and  NY  maps  and  fieldwork  with  a  soil  probe. 
The  ovals'  labels  correspond  to  the  NRCS  nomenclature  for  soil  series.  The 
part  of  the  labels  in  parentheses  differentiates  between  different  map  units 
containing  the  same  soil  type.  With  respect  to  the  arc  labels,  WT  is  "wetter 
than".  For  example,  the  Henry  (Hn)  soil  is  expected  to  be  wetter  than  the 
Grenada  B  (GrB)  soil,  and  "Sim"  means  "similar  to".  The  dotted  lines 
represent  a  weaker  relationship  than  the  solid  lines. 
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Figure  7-23:  Plant  density  neighborhood  criterion.  The  figure  shows  the  Suggs  4  field, 
the  soil  mapping  units  it  is  divided  into,  and  the  plant  density  relationships 
elicited  from  the  experts.  Free  ovals  label  the  soil  map  units  that  contain  them. 
Ovals  contained  within  larger  ovals  denote  soil  units  not  shown  in  the  soil 
survey  but  identified  from  anomalies  in  EL,  WI,  EC,  and  NY  maps  and 
fieldwork  with  a  soil  probe.  The  ovals'  labels  correspond  to  the  NRCS 
nomenclature  for  soil  series.  The  part  of  the  labels  in  parentheses 
differentiates  between  different  map  units  containing  the  same  soil  type.  With 
respect  to  the  arc  labels,  GT  is  "greater  than".  For  example,  the  Grenada  B 
(GrB)  soil  is  expected  to  have  a  greater  plant  density  than  the  Henry  (Hn)  soil, 
and  "Sim"  means  "similar  to".  The  dotted  lines  represent  a  weaker 
relationship  than  the  solid  lines. 
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Knowledge  Elicitation:  Parameter  Sensitivity  and  Parameter  Selection 

Causal  diagrams  (Eden  et  al.,  1992;  Howard  and  Matheson,  1984)  are  directed 
graphs  that  model  perceived  cause-effect  relationships  among  different  variables. 
Variables  are  represented  as  nodes,  and  the  relationships  between  them  as  arcs.  Typically 
relationships  are  either  positive  (labeled  with  a  "+")  or  negative  (  "-").  A  positive 
relationship  is  such  that  a  change  in  the  predecessor  node  (for  example,  an  increase) 
causes  a  change  in  the  same  direction  (i.e.  an  increase)  in  the  successor  node.  In  a 
negative  relationship,  the  changes  in  predecessor  and  successor  have  opposite  directions. 

As  a  first  step  in  selecting  the  parameters  to  estimate  in  the  IM  framework,  we 
made  a  causal  diagram  model  of  water  balance  in  the  field,  initially  populated  by  myself 
based  on  the  CERES  water  balance  model  (Ritchie,  1985),  and  successively  modified 
during  group  discussions.  We  adopted  the  following  ideas  from  Carley  and  Palmquist 
(1992)  regarding  knowledge  representation  in  model  form: 

•  Mental  models  are  internal  representations. 

•  Language  is  the  key  to  understanding  mental  models  (i.e.,  mental  models  can  be 
represented  linguistically). 

•  Mental  models  can  be  represented  as  networks  of  concepts. 

•  The  meaning  of  a  concept  for  an  individual  is  embedded  in  its  relations  to  other 
concepts  in  the  individual's  mental  model. 

•  The  social  meaning  of  a  concept  is  not  universally  defined.  Instead,  it  is  defined 
through  the  intersection  of  individuals'  mental  models. 

Thus,  we  used  language  as  our  primary  information  exchange  tool,  through  a  series 
of  (open  and  directed)  questions  I  posed  to  the  domain  experts.  Aware  of  the  lack  of 
universality  mentioned  above,  we  supported  our  use  of  language  with  drawings  and 
diagrams  such  as  the  ones  shown  in  Figure  7-24. 
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Figure  7-24.  Using  diagrams  to  support  the  discussion  and  knowledge  elicitation  process. 
Jerry  Mcintosh  (NRCS)  explains  spatial  water  movement  in  the  region. 

The  resulting  causal  diagram  (Figure  7-25)  provides  a  simplified  representation  of 
how  the  water-related  aspects  of  our  spatial  crop  model  work,  including  the  relationships 
between  readily  available  data  (bottom  row  of  nodes),  the  relevant  physical  processes  in 
the  system  (central  region  of  the  diagram),  and  model  parameters  (top  row  of  nodes). 
Some  of  these  parameters  (e.g.,  roughness),  and  the  processes  they  control  (e.g., 
subsurface  flow)  do  not  exist  in  the  CERES  water  balance,  however. 

Making  the  causal  map  helped  the  domain  experts  and  myself  attain,  in  terms  of  the 
previously  mentioned  ideas  of  Carley  and  Palmquist  (1992),  common  meanings  for  the 
relevant  water  balance  concepts.  We  could  then  proceed  to  selecting  parameters  to 
estimate  with  the  IM  framework.  (With  these  data  we  could  also  have  populated 
parameter  sensitivity  criteria  (Figure  7-4)). 
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Figure  7-25.  Conceptual  water  balance  model  expressed  as  a  causal  map.  The  "LPD" 
label  refers  to  a  landscape-position-dependent  relationship.  The  "SB"  label 
refers  to  a  relationship  dependent  on  non-scalar  texture  data. 

Figure  7-26  shows  the  record  of  a  discussion  session.  In  this  session  we  revisited  a 
previous  discussion  about  the  limiting  factors  of  each  of  the  domain  units  shown  in 
Figure  7-20. 1  also  presented  some  additional  questions.  In  all  cases,  the  participating 
experts  were  asked  to  reach  a  consensus  about  their  level  of  confidence  in  their  answer, 
expressed  as  a  probability  or  a  verbal  quantifier  ("Certain",  "Probable",  etc.)  To  obtain 
comparable  results  across  sessions,  and  to  provide  the  experts  with  a  common  conversion 
from  verbal  quantifiers  to  probabilities,  we  used  a  modified  version  of  the  scale  proposed 
by  Renooij  and  Witteman  (1999),  shown  in  Figure  6-1. 

In  a  subsequent  discussion  session,  the  data  shown  in  Figure  7-26  were  combined 
into  a  common  representation  (Figure  7-27)  including  the  group's  perceptions  on  how  the 
limiting  factors  are  related  to  CERES  water  balance  parameters.  We  then  chose  the 
parameters  to  estimate  (Table  7-5). 
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Discussion  Session  #7 


Participants 


.  Jerry  Mcintosh  (NRCS) 

*  Rick  Murdock  (Ponderosa  Farms, 

AgConnections,  Inc.) 
>  Andres  Ferreyra  (University  of  Florida) 


Agenda:  Soil  limiting  factors 


■  We're  going  to  review  the  yield 
limiting  factors  Andres  discussed 
with  John  &  Rick  on  7/5/02. 


PuB 


LoC2 


LoB 


Probable 

Expected 


Not  e. peeled 
Improbable 


What  limits  yield  is 
primarly  the  shallow 
depth  to  the  fragipan. 
Also,  very  shallow 
topsoil  depth. 
Relevance:  harder  to 
plant,  worse  seed -soil 
contact,  less  available 
nutrients,  N&P  wil  run 
off,  denitrify,  etc. 


Probable 
E  xpected 


Limiting  factor:  High 
water  output,  low  water 
Input. 

Secondary  factor: 
fragipan  depth. 
Greater  runoff  due  to 
slope. 

Does  not  get  extra 
water  from  anywhere 
else  on  the  landscape. 


Probable  4 


Not  expected 

Improbable 


Limiting  factor: 

RaHih  lfw  bycr  at  36". 
(John's  choice) 
Divergent  landscape 
position,  but  not  as 
strong  as  LoC2. 


GrB 


GrB  (Ba) 


CaA(N) 


Probable  < 
Expected 


Not  expected 
improbable 

impossible  -1- 


YbkMrnithg  factor:  liming 
byer  When  day  at  bottom 
dries,  I  wil  get  hard  even  f 1 
does  not  have  a  strong  pan 
(John's  choice.  Jerry  demotes 
t  to  secondary). 
Jerry  thnks  depth  to  nestrrtrve 
features  (fragpan  or  +ctay%) 
is  the  way  to  express  this. 
Note  that  GrB  gets  more  water 
from  upsbpe  than  Lo. 
The  area  would  be  expected 
to  yetl  better  than  the  rest 


Probable 
Expected 


Not  expected 
Improbable 


Dynamic  balance  between  early 
spring  wetness  problems  (refers 
to  moffling  depth)  and  hardness 
of  clay  when  the  soil  gets  dry. 
This  is  a  weather- year -dependent 
behavior,  but  the  wetness 
problem  is  NOT  typical  I.e.  ifs 
Doped  toward  the  clay  hardness 
induced  iimitaoons 
This  region  showed  no  significant 
mechanical  impedance  down  to 
at  least  4'. 

Jerry:  physico-chemical 
processes ' 


Probable 
Expected 


Not  expected 
Improbable 


Dynamic  balance  between 
early  spring  wetness  problems 
{refer  to  mottling  depth  ft 
grayness)  and  hardness  of  clay 
when  the  soil  gett  dry.  T  his  is 
aweather-year.de  pe  nd  e  nt 


•    'lit  1U1  >  ekHci  to  the  iurlic* 
"Late-season  soybeans  are  the 
one  case  where  a  perched 
water  bote  can  be  a  good 


CaA  (W) 


Certain     -j-  100 

Probable 
Expected 


Not  expected 
Improbable 


Limiting  factor: 
accumulation  of  water 
(high  Wi)  affecting 
stand  quality  in  the 
early  season. 
This  area  could  benefit 
for  late  season 
soybeans. 


||  Certain 


Probable 
Expected 


Not  expected 
Improbable 


Hn 

-  Limiting  factor:  poor  stand 
quality  due  to  excessive 
water  in  the  early  season. 

■  Jerry  adds  the  hardness  of 
the  clay  when  it  dries  out. 

-  See  points  407  and  428. 


if  

[      Certain  - 


CaA  (S) 


Probable 

Expected 


Not  expected 
Improbable 


i  Limiting  'actor :  poor  Stand  quality  due 
to  excessive  water  M  the  ealy  season. 

i  Whan  thev  grey  stuff  dries  out,  tt  wril 
limit  root  system  depth  when  it  dries 
This  item  not  as  certain  because  'the 
brown  stun*  In  the  subsoil  gives  you 
more  mm  to  wiggle"  Ntfl  forgiving. 
Why'  This  is  the  wettest  of  the  three 
Calloway  A  regions.  Somewhat  similar 
to  Henry  soil  behavior  although  subsoil 
is  brown,  not  grey 
May  have  sixface  effects  (ponding). 
Note  how  in  408  and  AOS  we  noted 
that  the  topsoil  was  not  good,  yet  Che 
subsoil  was  great  Consider  infiltration 
*  convergent  LP.  as  limiting. 

12 


CaB2  [Jerry:  CaA  (E)] 


GrA 


Certain  -|- 

Probable4 
Expected 


Not  expected 
Improbable 


Limiting  factor:  start  of 
the  day  at  18"  in  point 
415  will  play  a  part  if  it 
gets  dry  in  the  summer, 
pan  at  at  32"  also. 
This  region  probably  gets 
rid  of  its  extra  water  into 
the  waterways,  both  in 
surface  &  subsurface 
flows. 


Probable 
Expected 


Not  expected 
Improbable 


Limiting  factor:  start  of 
the  clay  at  21"  in  point 
408  will  play  a  part  if  it 
gets  dry  in  the  summer, 
pan  at  46"  also. 
We  infer  this  from  point 
408,  which  is 
transibonal  between 
Grenada  and  Calloway. 
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Probable  4 
Expected 


Not  expected 
Improbable 


CaB 

•  Limiting  factor:  start  of  the 
clay  will  play  a  part  if  it  gets 
dry  tn  the  summer,  pan 
also. 

.  Points  416,  418,  A04. 

■  It  does  not  have  the 
excess-  water-rela  ted 
problems  of  other  Calloway 
regions. 

■  A04  is  probably  not  very 
trustworthy  because  there 
has  been  a  lot  of  traffic  out 
there. 
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GrB(N) 


Topsoil  thickness 


The  48"  limit 


Probable  < 
Expected 


Nol  expected 
Improbable 


<  Limiting  factor:  pan  at 
36",  start  of  the  day  at 
24"  in  point  449  will 
play  a  part  if  it  gets  dry 
In  the  summer. 
Points  420,  404,  449, 
transition  to  418. 
Jerry:  This  is  probably  a 
continuation  of  the  GrB 
(Ba)  "banana",  that  was 
interrupted  by  the 
waterway. 
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.  Andres:  Why  is  topsoil  thickness  important' 
■  Jerry:  The  critical  property  of  topsoil  thickness  is 
its  association  with  organic  matter.  This  carries 
with  it  a  greater  CEC  &  potential  availability  of 
nutrients. 

Jerry:  The  greater  the  OM,  the  greater  the 
infiltration. 

Rick:  greater  OM%  -*  greater  water  holding  cap. 
Rick:  maybe  we  can  add  an  arrow  in  the  process 
network  from  OM  to  Infiltration. 
Andres:  OM  8i  residue  as  different  entities? 
Rick:  "OM  is  residue  +  dme". 


Andres:  What  would  you  propose  as  an  upper  limit 
to  the  soil  depth  allowable  in  the  IM  process  for 
places  where  we  dkJnt  find  restrictions  up  to  487 
Jerry  (started  recording  here):  the  soil  gets  good 
again  below  the  pan. 

Jerry:  I  donl  really  care  what's  going  on  below  48". 
In  my  mind,  if  you've  got  48"  of  decent  soil,  thaf  s 
enough.  Landscape  position  will  modify  the  validity 
of  this  statement,  however. 


17 


Figure  7-26.  Record  of  a  discussion  session  with  domain  experts. 
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Figure  7-27.  Compact  representation  of  the  limiting  factor  data.  The  upper  set  of  nodes 
represents  CERES  model  parameters.  The  central  set  of  nodes  represents 
processes  occurring  in  the  field.  The  lower  set  of  nodes  corresponds  to  the  13 
simulation  domain  locations.  The  upper  set  of  arrows  shows  how  the  crop 
model  parameters  are  related  to  the  limiting  factors.  The  lower  set  of  arrows 
shows  to  what  extent  each  factor  influences  yield  in  each  soil  type. 


Table  7-5.  Crop  model  parameters  and  ranges  used  in  IM  framework. 


Parameter  Definition 

Units 

Minimum 

Maximum  N°  Points 

KSAT      Saturated  hydraulic 

cm  d"1 

0.0001 

0.1 

16 

conductivity,  bottom  soil  layer 

CN2        SCS  runoff  curve  number 

72 

92 

17 

SDEP      Soil  depth 

Cm 

45 

165 

17 

PPOP      Plant  density 

Plants  m" 

3 

8 

17 

A  noteworthy  result  of  the  parameter  selection  process  is  that  the  soil  water  holding 
limits  (DUL,  LL,  SAT)  were  not  chosen  as  parameters  to  estimate  using  the  IM 
framework.  The  soil  scientist  indicated  that  the  total  available  water  (DUL  -  LL)  would 
remain  practically  constant  throughout  the  field,  although  the  lower  limit  (LL)  could  be 
expected  to  vary  somewhat.  These  results  are  similar  to  those  of  Ritchie  et  al.  (1999). 
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However,  variation  of  the  LL  can  conceivably  influence  the  model  results 
independently  of  its  relationship  with  DUL.  The  LL  is  used  by  the  CERES  water  balance 
routine  to  estimate  unsaturated  hydraulic  conductivity,  which  in  turn  is  used  to  compute 
potential  daily  root  water  uptake  (Ritchie,  1998).  Variability  in  the  LL  could  thus  modify 
the  crop's  behavior  under  supply-limited  conditions. 

In  order  to  assess  the  expected  level  of  variation  of  the  LL  across  the  field,  we 
obtained  textural  fractions  from  laboratory  measurements  made  by  the  National  Soil 
Survey  Center  (NSSC)  on  samples  of  three  of  the  soils  found  in  the  field:  Calloway 
(NSSC,  1991a),  Grenada  (NSSC,  1991b),  and  Loring  (NSSC,  1991c).  We  also  got 
textural  fractions  for  the  Henry  soil  from  the  Soil  Survey  of  Calloway  County  (USDA 
SCS,  1973).  We  used  the  Saxton  pedotransfer  functions  (Saxton  et  al.,  1986)  to  estimate 
the  corresponding  LL,  DUL,  and  SAT  values.  The  results,  interpolated  to  fit  in  the  soil 
layer  structure  used  by  CERES,  are  shown  in  Table  7-6.  Note  the  great  similarity  among 
the  soil  water  characteristics  at  different  soil  types.  We  consequently  adopted  a  unique  set 
of  characteristics  for  the  whole  field,  using  values  averaged  over  the  different  soil  types. 


Table  7-6.  Soil  water  holding  characteristics  obtained  by  applying  the  Saxton 

 pedotransfer  functions  to  textural  fractions  taken  from  the  literature.  

 Loring  Calloway  Grenada  Henry  

Depth  LL  DUL  SAT  LL  DUL  SAT  LL  DUL  SAT  LL  DUL  SAT 
5  0.140  0.330  0.510  0.140  0.320  0.510  0.140  0.330  0.510  0.140  0.320  0.510 
15  0.140  0.330  0.510  0.140  0.320  0.510  0.140  0.320  0.510  0.140  0.320  0.510 
30  0.140  0.330  0.510  0.120  0.320  0.510  0.120  0.320  0.510  0.120  0.320  0.510 
45  0.140  0.330  0.510  0.120  0.320  0.500  0.155  0.330  0.510  0.120  0.320  0.500 
60  0.140  0.330  0.510  0.150  0.310  0.510  0.150  0.320  0.510  0.150  0.310  0.510 
75  0.130  0.320  0.500  0.140  0.320  0.500  0.140  0.320  0.500  0.140  0.320  0.500 
90  0.120  0.310  0.500  0.130  0.320  0.500  0.140  0.310  0.500  0.130  0.320  0.459 
120  0.120  0.310  0.500  0.130  0.320  0.500  0.140  0.310  0.500  0.130  0.320  0.500 
150  0.120  0.310  0.500  0.130  0.320  0.500  0.140  0.310  0.500  0.130  0.320  0.500 
180     0.120  0.310  0.500    0.130  0.320  0.500    0.140  0.310  0.500    0.130  0.320  0.500 
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Observed  Yields  in  the  IM  Framework  Domain 

Figure  7-28  shows  the  observed  crop  yield  in  the  13  locations  of  interest  for  the 
three  years  under  study.  The  locations  are  ranked  by  average  yield  over  the  three  years. 
The  lines  linking  the  points  on  the  graph  do  not  imply  spatial  interpolation  (some  of  the 
adjacent  locations  in  the  graph  are  not  contiguous  in  space,  e.g.  LoB  and  CaA(S));  they 
are  used  to  highlight  yield  trends. 

Note  how  the  highly  eroded  PuB2  soil  had  the  lowest  average  yield,  followed 
mostly  by  shallow,  poorly  drained  Calloway  soils  with  relatively  high  clay  content.  On 
the  other  end,  the  GrB(Ba)  soil  had  the  highest  average  yield,  followed  by  the  GrB  and 
GrB(N)  soils.  These  soils,  especially  the  GrB(Ba)  are  deeper  than  the  rest,  and  are 
considered  the  best  soils  in  the  field  (J.  Potts,  Pers.  Comm.) 
16000  i 


Figure  7-28.  Observed  crop  yield  in  the  13  locations  of  interest. 

Despite  the  ranking  made  for  visualization  purposes,  the  yields  in  the  three  years 
under  study  are  not  directly  comparable.  The  Suggs  4  field  was  planted  with  the  same 
hybrid  (Pioneer  33A14,  yellow)  in  1999  and  2001,  but  was  planted  with  a  somewhat 
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lower-yielding  white  maize  (Pioneer  3281 W)  in  1997.  In  order  to  compare  the  three 
years'  behavior  better,  we  used  relative  yields.  Figure  7-29  shows  the  same  yields  of 
Figure  7-28,  expressed  as  fractional  deviations  with  respect  to  each  year's  mean  yield 
across  the  1 3  locations.  Again,  the  lines  are  meant  to  simplify  visualization,  and  do  not 
imply  any  form  of  interpolation. 
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Figure  7-29.  Observed  relative  crop  yield  in  the  1 3  locations  of  interest  for  the  two 
calibration  years  (1999  and  2001)  and  validation  year  (1997). 


Figure  7-28  shows  that  crop  yield  was  consistently  higher  in  2001  than  in  1999  at 
the  13  locations.  Figure  7-29  shows  that  in  relative  terms,  the  yield  in  2001  varied  little 
around  the  mean.  The  variation  in  1999  was  higher,  and  in  1997  was  higher  still, 
especially  for  the  two  highest-yielding  soil  types.  Below  we  analyze  the  amount  and 
timing  of  rainfall  during  the  three  seasons  to  help  clarify  the  reasons  for  this  observed 
interannual  variability. 
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1       31      61      91     121     151     181  211 
Day  of  the  year 

Figure  7-30:  Cumulative  rainfall  from  Jan.  1  to  Aug.  23  during  1997,  1999,  and  2001. 


Figure  7-30  shows  the  cumulative  rainfall  in  the  three  years  of  interest.  Note  how, 
on  the  three  years,  there  was  a  similar  amount  of  rainfall  in  the  weeks  preceding  the 
planting  date  of  DOY  105.  None  of  the  crops  lacked  water  in  early  growth,  but  the 
situation  in  the  critical  window  around  flowering  was  different. 

Figure  7-31  shows  this  in  greater  detail.  There  was  abundant  rainfall  before 
flowering  in  2001,  which  served  to  build  a  good  soil  water  supply.  After  flowering,  there 
was  hardly  any  more  rain  during  the  critical  window,  so  there  was  ample  available  solar 
radiation,  and  the  ears  set  a  large  number  of  seeds.  The  situation  in  1999  was  somewhat 
different:  a  few  days  without  rain  before  flowering,  and  ample  rain  (and  consequently, 
less  solar  radiation)  after  flowering.  This  is  consistent  with  the  relatively  constant  yield 
difference  between  1999  and  2001  across  the  field,  shown  in  Figure  7-28. 

The  1997  season  was  different;  it  had  the  least  amount  of  rainfall  during  (or 
immediately  before)  the  critical  window.  Thus,  differences  across  the  landscape  in  soil 
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depth,  CN2,  and  KSAT  probably  played  an  important  role  in  determining  the  relative 
differences  in  grain  number  and  consequently,  yield,  shown  in  Figure  7-29. 

This  situation  is  important  in  the  context  of  the  previously  discussed  (in  Chapter  2) 
differences  in  sensitivity  of  the  runoff  and  soil  water  holding  parameters  in  the  DSSAT 
models.  Since  the  crop  set  its  grain  number  in  1999  and  2001  under  landscape-invariant 
conditions  of  good  radiation  and  good  soil  water  (in  2001),  or  good  water  but  lower 
radiation  (in  1999),  the  soil  water  holding  parameters  would  not  have  been  sensitive  in  an 
IM  parameter  estimate  using  only  yield  data  from  1999  and  2001,  and  would  have  been 
estimated  poorly.  This  lends  additional  support  to  the  idea  of  incorporating  additional 
information  to  constrain  the  parameter  estimation  process. 
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Figure  7-31.  Rainfall  during  the  crop  season.  Simulated  flowering  and  physiological 
maturity  dates  are  shown  with  arrows. 
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Evaluation 

We  examined  in  detail  the  parameter  values,  yield,  and  soil  water  content  at  four 
locations  on  the  field  (Figure  7-32)  located  on  the  slope  at  approximately  regular  distance 
and  elevation  intervals  (Figure  7-33). 
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Figure  7-32.  Evaluation  locations,  shown  on  the  normalized  yield 
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Figure  7-33.  Relative  position  on  the  landscape  of  the  four  locations  used  for  evaluation. 

Elevations  are  expressed  with  respect  to  an  arbitrary  reference.  Distances  are 
measured  along  the  polygonal  line  linking  the  locations. 
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We  chose  these  locations  because  of  their  contrasting  characteristics:  a  well- 
drained,  relatively  high-yielding  Loring  soil;  the  Calloway  A  (W)  soil  which,  although 
relatively  high  in  the  landscape,  has  drainage  problems;  the  high-yielding  Grenada  B  (Ba) 
soil  on  the  slope,  and  the  low-lying,  poorly-drained  Henry  soil. 
Simulations  with  a  neighborhood  criterion 

In  scenario  1C  (Table  7-1)  we  populated  the  OWA  objective  function  using  only 
one  criterion,  soil  depth  neighborhood.  Figure  7-34  shows  the  results  of  five  realizations 
of  its  parameterization  process.  Bars  of  the  same  color  denote  the  same  realization  across 
the  different  figures. 

When  the  IM  framework  is  run  with  only  one  objective  function  input,  such  as  the 
soil  depth  neighborhood  criterion  (Figure  7-2 1 ),  the  parameters  are  not  constrained  to  a 
unique  solution — there  are  many  combinations  of  parameters  that  can  produce  equally 
good  solutions.  In  fact,  the  five  solutions  of  Figure  7-34  all  produced  optimal  (i.e.  zero- 
valued)  objective  function  results.  Note  the  great  variability  of  CN2  (Figure  7-34A), 
KSAT  (Figure  7-34C),  and  PPOP  (Figure  7-34C,  and  which,  as  shown  in  Table  7-5,  was 
only  allowed  to  vary  between  3  and  8  plants/m  ),  across  realizations.  This  was  expected, 
because  those  parameters  are  not  constrained  in  any  way. 

The  behavior  of  soil  depth  (SLDEP)  (Figure  7-34B,  and  which  controls  the 
criterion)  is  different.  Soil  depth  parameter  values  are  more  stable  across  realizations  and 
soil  types.  Moreover,  the  stability  is  dependent  on  how  constrained  the  corresponding  soil 
mapping  unit  is.  For  example,  the  GrB(Ba)  unit  SLDEP  is  less  variable  than  the  LoB 
SLDEP.  This  happens  because  the  constraints  to  which  the  LoB  SLDEP  parameter  is 
subjected  are  less  restrictive  than  those  constraining  GrB  (Ba):  soil  depth  at  GrB  (Ba)  is 
expected  to  be  greater  than  that  of  its  three  neighbors,  each  one  of  which  is  in  turn 
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subjected  to  multiple  constraints.  On  the  contrary,  some  of  the  LoB  soil's  neighbors  are 
unconstrained  beyond  their  relationship  with  the  LoB  soil. 
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Figure  7-34.  Five  realizations  of  parameterization  using  the  IM  framework  with  only  one 
objective  function  input,  the  soil  depth  neighborhood  criterion.  The  five 
different  realizations  correspond  to  the  five  colors  shown  (e.g.  the  white  bars 
in  all  the  panes  correspond  to  the  same  realization). 

IM  with  coupled  and  uncoupled  models 

Figure  7-35  shows  the  parameter  estimates  from  scenarios  1 A  and  2A,  2-year  IM 
processes.  In  general,  the  parameter  values  are  not  remarkably  different  between  the 
coupled  and  uncoupled  models.  The  parameter  values  of  the  LoB  soil,  uppermost  in  the 
toposequence,  are  necessarily  equal  because  the  top  cell  of  the  coupled  model  does  not 
receive  runon  from  above.  However,  some  differences  exist  downslope,  caused  by  the 
need  of  the  uncoupled  model  to  compensate  the  lack  of  runon  by  means  of  a  deeper  soil 
or  a  lower  runoff  curve  number,  as  previously  shown  in  Chapter  2  (Figure  2-6). 


181 


»     0  30 

0.25 

e  o  20 

o 
n 

E  0  15 


35  0  10 


005 
0  00 


LoB 


'■  >:■'  " ,,' 

:    ■  ; 

— ; — i 

CaA(W) 


GrB(Ba) 


Coupled 
Uncoupled 


:  ->..-i 


■  : 

i  i .  y 


Hn 


CaA(W)  GrB(Ba) 


B 


D 


140 
120 
?  100 


Q_ 

in 

a  80 


BO 
40 

65 

6.0 
5.5 
H  50 

Q. 

a. 

O  4.5 


LoB 


4  0 

3  5 


LoB 


Still 


CaA(W)  GrB(Ba) 


CaA(W)  GrB(Ba) 


Figure  7-35.  Parameter  estimates  of  2-year  coupled  and  uncoupled  model  IM 
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Figure  7-36.  Errors  and  comparison  of  yields  for  2-year  coupled  and  uncoupled 
scenarios  relative  to  observed  values. 
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Figure  7-37.  Parameter  estimates  of  3-year  coupled  and  uncoupled  model  IM  scenarios. 
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Figure  7-38.  Errors  and  comparison  of  yields  for  3 -year  coupled  and  uncoupled  model  IM 
scenarios  relative  to  observed  values. 
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The  relative  values  of  CN2  (Figure  7-3  5 A)  for  different  landscape  positions  are 
realistic:  the  CaA(W)  soil  is  situated  in  a  high-WI  (due  to  a  low  gradient  rather  than  a 
high  contributing  area)  zone,  where  water  can  sometimes  collect.  This  is  consistent  with 
low  CN2  values.  The  GrB  (Ba)  and  LoB  soils  are  on  a  ridge  and  slope,  respectively.  This 
is  consistent  with  higher  CN2  values,  although  LoB  would  be  expected  to  have  a  higher 
CN2  than  GrB(Ba).  Finally,  the  Hn  soil  is  at  a  high-WI  position  situated  low  in  the 
landscape.  The  coupled  model  sends  all  of  the  runoff  from  positions  higher  in  the 
landscape  to  it,  and  it  is  assumed  that  it  flows  down  the  slope  in  sheet  form,  and  that  all 
of  it  is  available  for  infiltration  downslope.  This  sheet  runoff  assumption  is  unrealistic, 
and  creates  a  large  CN2  difference  between  the  uncoupled  and  coupled  model— the  CN2 
value  of  the  Hn  soil  grew  to  allow  the  coupled  model  to  rid  itself  of  excess  runon, 
whereas  the  CN2  value  of  the  spatially-uncoupled  model  decreased  to  compensate  for  the 
absence  thereof. 

Soil  depth  (Figure  7-3 5B)  values  in  the  LoB  soil  were  somewhat  higher  than 
expected  for  an  eroded  soil  with  a  fragipan.  However,  Loring  soils  may  have  poorly 
developed  fragipans  (J.  Mcintosh,  pers.  comm.),  which  may  allow  roots  to  penetrate 
beyond  the  top  of  the  fragipan.  Soil  depth  for  the  CaA(W),  GrB(Ba),  and  Hn  soils  is 
slightly  lower  than  expected  (Table  7-3)  in  the  case  of  the  coupled  model.  This  may 
reflect  both  the  experts'  perception  that  the  effective  soil  depth  of  the  CaA(W)  soils  may 
be  reduced  because  of  clay  at  the  bottom  of  the  profile  (Figure  7-27),  as  well  as  the 
aforementioned  lack  of  realism  of  the  sheet-runon  assumption.  In  the  uncoupled  model 
case,  soil  depth  was  increased  to  compensate  the  lack  of  runon. 
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Yields  were  simulated  accurately  by  both  the  coupled  and  uncoupled  models 
(Figure  7-36A),  although  the  error  increased  markedly  for  the  CaA(W)  soil,  in  which 
yields  were  overestimated  by  the  model  in  year  1997.  Expert  opinion  (Figure  7-26) 
suggested  that  yield  is  limited  in  this  soil  by  excessive  wetness  in  the  early  season,  which 
adversely  impacts  stand  quality.  This  effect  is  captured  by  the  decreased  PPOP  (Figure 
7-35D).  Moreover,  the  primary  contribution  to  the  RMSE  in  the  CaA( W)  soil  came  from 
the  1997  season,  which  had  higher  rainfall  in  the  weeks  before  planting  (Figure  7-30). 
The  model  captured  the  effects  of  excess  water  by  reducing  plant  stand  density,  but  it 
could  not  capture  the  effects  of  standing  water  on  stand  uniformity,  which  is  what 
probably  caused  the  extremely  low  observed  yield  in  1997.  Pommel  and  Bonhomme 
(1998)  showed  how  uneven  stand  uniformity  has  a  greater  impact  on  yield  than  low  plant 
density  in  a  uniform  stand.  The  latter  is  the  situation  assumed  by  CERES. 

A  noteworthy  result  is  that  the  PPOP  value  for  the  LoB  soil  is  lower  than  expected; 
the  domain  experts  predicted  that  PPOP  should  be  highest  in  the  LoB  soil,  rather  than  in 
GrB(Ba).  This  is  consistent  with  the  unexpectedly  high  soil  depth  and  unexpectedly  low 
CN2  values  in  LoB — the  IM  process  spuriously  compensated  an  excessive  availability  of 
water  with  a  decreased  plant  population. 

Results  for  the  3-year  scenarios,  IB  and  2B,  (Figures  7-37  and  7-38)  were  similar 
to  those  of  the  2-year  scenarios,  with  some  noteworthy  exceptions.  The  LoB  case  is  now 
more  consistent  with  expert  opinion,  although  there  may  have  been  some  tradeoff 
between  a  large  CN2  value  and  a  large  SLDEP.  Another  noteworthy  effect  is  the  "trade" 
that  the  uncoupled  model  made  in  its  compensation  for  the  lack  of  runon  in  lower 
landscape  positions:  instead  of  primarily  reducing  its  runoff  curve  number  (CN2)  as  in 
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the  2-year  scenario,  it  greatly  increased  its  soil  depth.  This  behavior  may  be  caused  by  the 
discrete  nature  of  the  parameter  space,  or  by  effects  specific  to  the  2001  season  (the 
biased-weather  concept  presented  in  Chapter  2).  Both  possible  causes  support  adding 
additional,  spatial-context-dependent  constraints  to  the  parameterization  process  to 
reduce  the  effects  that  unaccounted-for  yield-reducing  effects  have  on  the  parameter 
estimation  process. 

Soil  water  observations  (Figures  7-39  through  7-54)  reveal  another  source  of  error 
identified  in  Chapter  2:  lack  of  knowledge  of  initial  conditions.  As  previously  described, 
we  assumed  DUL  as  the  initial  soil  water  content  for  all  simulations.  This  implies  the 
assumption  that  excess  water  would  eventually  drain.  In  some  cases  (Figures  7-46,  7-49, 
7-52),  the  assumption  was  valid,  because  initially  high  (above-DUL)  observed  soil  water 
content  eventually  converged  to  the  simulated  water  content.  In  many  other  cases  our 
initial  conditions  assumption  seems  unreasonable  in  light  of  the  results,  which  highlight 
the  drainage  limitations  of  some  of  these  soils.  Most  notable  among  these  cases  were  the 
lower  layers  of  the  Grenada  and  Henry  soils  (Figures  7-50,  7-53,  7-64). 

The  LoB  soil  stands  out  as  the  one  with  the  most  poorly  simulated  water  balance. 
Observed  data  correspond  to  a  soil  with  little  or  no  crop  water  demand.  This  was 
probably  the  result  of  the  tube  being  located  off  the  row  and  near  the  center  of  the  furrow, 
and  thus  exposed  to  lower  demand  from  the  maize  crop,  as  observed  by  Prinsloo  et  al. 
(2003).  In  our  case,  the  farmer  initially  avoided  planting  over  the  tubes,  creating  this 
artifact.  In  contrast,  simulation  in  the  CaA  (W)  soil  was  relatively  successful,  as  was  the 
simulation  for  the  first  30  cm  of  the  Hn  soil.  The  marked  increase  in  observed  soil  water 
in  the  lower  layers  of  the  Hn  soil  may  be  due  to  unaccounted-for  subsurface  flow. 
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Comparing  the  water  content  simulated  by  the  spatially-uncoupled  and  coupled 
models  (black  and  grey  lines,  respectively,  in  Figures  7-43  to  7-54),  differences  generally 
increase  lower  in  the  toposequence  and  later  in  the  season.  Especially  noteworthy  is  that 
the  soil  water  content  in  the  Hn  and,  to  a  lesser  extent,  GrB(Ba)  soils  is  actually  lower  in 
the  coupled  model  than  in  the  spatially-uncoupled  model.  This  is  not  due  to  a  greater 
demand  by  the  crop  simulated  by  the  coupled  model.  On  the  contrary,  in  the  Hn  soil  the 
spatially-uncoupled  model  simulated  more  transpiration  than  the  coupled  model  (Figure 
7-55).  The  answer  to  this  apparent  contradiction  lies  in  the  soil  depth  estimated  by  the  IM 
process  (Figure  7-40D),  over  twice  as  deep  in  the  spatially-uncoupled  model.  This  has  the 
result  that  water  demand  is  split  between  more  layers  in  the  spatially-uncoupled  model, 
reducing  the  demand  from  any  given  layer. 

Our  first  future  research  step  is  to  explore  new  parameterization  scenarios  of  this 
same  system,  using  a  full  integration  of  neighborhood  criteria  and  yield  history  criteria. 
We  expect  this  to  produce  more  realistic  results  than  the  scenarios  shown  in  Table  7-1, 
avoiding  the  inconsistencies  and  spurious  parameter  tradeoffs  shown  above. 
Recommendations  for  Building  Neighborhood  Criteria 

For  each  inconsistency  shown  in  the  previous  paragraphs,  interaction  with  experts 
had  a  priori  provided  the  information  necessary  to  avoid  spurious  behavior.  This 
knowledge  can  be  codified  as  neighborhood  criteria  and  evaluated  by  means  of  simple 
algebraic  equations  (Figure  7-6). 

Practical  construction  of  neighborhood  criteria  can  follow  the  following  steps: 

•  Build  a  common  understanding  of  the  field,  the  capabilities  of  crop  models,  and  the 
role  of  different  crop  model  parameters.  This  can  be  done  using  causal  diagrams  or 
Bayesian  networks. 
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•  Select  the  simulation  locations;  their  spatial  distribution  will  depend  on  the 
temporal  stability,  and  spatial  covariance  structure,  of  yield  in  the  field  (Chapters  3 
and  4).  The  selection  process  should  be  hypothesis-driven,  and  done  in  the  context 
of  discussion  of  the  available  data  such  as  elevation,  electroconductivity,  wetness 
index,  and  yield  maps. 

•  Select  the  parameters  to  estimate  at  each  location.  This  can  follow  a  discussion  on 
the  factors  that  limit  yield  in  each  of  the  chosen  simulation  locations,  and  on  which 
crop  model  parameters  correspond  to  those  factors. 

•  Elicit  spatial  constraints  for  the  selected  criteria  for  all  pairs  of  locations 
corresponding  to  adjacent  soil  mapping  units.  We  coded  this  knowledge  in  map 
form  (Figures  7-21  to  7-23),  but  it  can  also  be  coded  directly  in  the  form  of  a 
triangular  matrix,  each  row  and  column  of  which  correspond  to  one  of  the 
simulation  locations.  Only  a  subset  of  this  matrix  contains  valid  information:  the 
cells  corresponding  to  adjacent  mapping  units.  Appendix  B  shows  our  matrix  for 
the  soil  depth  neighborhood  criterion,  as  well  as  the  code  (in  Borland  Pascal  7.0) 
for  evaluating  the  neighborhood  and  yield  criteria. 

Neighborhood  criteria  provide  a  measure  of  compliance  with  a  priori  notions  of  a 
temporally  stable  pattern  of  the  model  input  or  output  under  consideration.  We 
implemented  the  neighborhood  criteria  by  aggregating  the  piecewise  linear  functions 
shown  in  Figure  7-6.  This  may  seem  arbitrary  when  compared  with  the  Spearman  rank 
correlation  method  used  to  evaluate  temporal  stability  in  Chapter  4  following  the  work  of 
Vachaud  et  al.  (1985).  However,  applying  the  rank  correlation  method  to  our  case  would 
require  having  an  expected  a  priori  ranking  of  the  model  input  (or  output)  of  interest 
(e.g.,  soil  depth,  wetness)  throughout  the  field,  and  calculating  the  Spearman  rank 
correlation  between  the  a  priori  ranks  and  the  ranked  data  corresponding  to  the  parameter 
set  under  consideration  in  each  IM  iteration. 

The  fundamental  problem  with  using  Spearman  rank  correlation  in  this  context  is 
its  requirement  for  a  global  set  of  ranks,  the  elicitation  of  which  may  not  be  simple.  Our 
experts  seemed  comfortable  with,  and  had  a  high  confidence  when  answering  questions 
about,  paired  adjacent  mapping  units  (e.g.,  CaA(W)  vs.  LoB  in  Figure  7-20).  However, 
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they  felt  markedly  less  certain  when  asked  to  compare  globally  (e.g.  compare  the  soil 
depth  of  CaA(W)  with  that  of  CaB).  This  can  be  accommodated  by  dividing  the  problem 
into  smaller  portions,  and  aggregating  the  results  of  tests  performed  on  subsets  of  the 
total  soil  map.  At  the  limit,  when  the  values  are  evaluated  two  at  a  time,  the  rank- 
correlation-based  method  is  equivalent  to  our  neighborhood  criterion,  scaled  to  a  [-1,1] 
interval  instead  of  [0,1],  and  assuming  a  high-confidence  "greater  than"  or  "less  than" 
comparison  in  which  X3  «  X4  (Figure  7-6)  which  thus  produces  extreme  (-1  or  1)  results 
only.  Our  neighborhood  criterion  provides  the  additional  capability  of  allowing  the  expert 
to  reduce  the  impact  of  a  comparison  that  he/she  does  not  have  much  confidence  in. 

An  alternative  method  for  expressing  and  aggregating  the  beliefs  of  experts  is  the 
Dempster-Shafer  theory  of  evidence  (Dempster,  1968;  Shafer,  1976).  This  theory  has  a 
strong  theoretical  basis,  but  it  requires  the  separate  handling  of  two  probabilities:  a  degree 
of  belief,  and  a  degree  of  plausibility.  Dempster-Shafer  theory-based  methods  have  been 
used  in  such  successful  expert  systems  as  MYCIN  (Shortliffe,  1 976),  but  were  not 
practical  for  our  case,  in  which  we  sought  to  develop  a  process  that  could  enable  users 
(crop  consultants,  for  example)  to  develop  their  own  expert  systems  for  spatial  crop 
model  parameterization.  Moreover,  the  Dempster-Shafer  method  can  generate  counter- 
intuitive results  that  are  difficult  to  grasp  (Zadeh,  1 984).  In  contrast,  particular  sets  of 
constraints  that  can  cause  problems  with  the  evaluation  of  neighborhood  criteria,  such  as 
closed  loops,  can  be  easily  detected  and  corrected  by  the  user  on  a  map. 
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LoB,  0-15  cm,  2001.  IM  with  3  yield  years, 
spatially-uncoupled  and  coupled  models 
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Figure  7-39.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  0-15  cm 
layer  of  the  LoB  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  These  results  are  the  same  for  both  the  spatially- 
uncoupled  and  coupled  models  (scenarios  IB  and  2B). 


LoB,  15-30  cm,  2001.  IM  with  3  yield  years, 
spatially-uncoupled  and  coupled  models 
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Figure  7-40.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  15-30  cm 
layer  of  the  LoB  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  These  results  are  the  same  for  both  the  spatially- 
uncoupled  and  coupled  models  (scenarios  IB  and  2B). 
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LoB,  30-45  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  and  coupled  models 
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Figure  7-41.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  30-45  cm 
layer  of  the  LoB  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  These  results  are  the  same  for  both  the  spatially- 
uncoupled  and  coupled  models  (scenarios  IB  and  2B). 


LoB,  45-60  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  and  coupled  models 
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Figure  7-42.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  45-60  cm 
layer  of  the  LoB  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  These  results  are  the  same  for  both  the  spatially- 
uncoupled  and  coupled  models  (scenarios  IB  and  2B). 
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CaA(W),  0-15  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-43.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  0-15  cm 
layer  of  the  CaA(W)  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 


CaA(W),  15-30  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-44.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  15-30  cm 
layer  of  the  CaA(W)  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 
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CaA(W),  3(M5  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-45.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  30-45  cm 
layer  of  the  CaA(W)  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 


CaA(W),  45-60  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-46.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  45-60  cm 
layer  of  the  CaA(W)  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 
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GrB(Ba),  0-15  cm,  2001 .  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-47.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  0-15  cm 
layer  of  the  GrB(Ba)  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 


GrB(Ba),  15-30  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-48.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  15-30  cm 
layer  of  the  GrB(Ba)  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 
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GrB(Ba),  3CM5  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-49.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  30-45  cm 
layer  of  the  GrB(Ba)  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 
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GrB(Ba),  45-60  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-50.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  45-60  cm 
layer  of  the  GrB(Ba)  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 
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Hn,  0-15  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-51.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  0-15  cm 
layer  of  the  Hn  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 


Hn,  15-30  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-52.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  15-30  cm 
layer  of  the  Hn  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 
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Hn,  30-45  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-53.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  30-45  cm 
layer  of  the  Hn  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 


Hn,  45-60  cm,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-54.  Simulated  and  observed  soil  water  data  for  the  2001  crop  season,  45-60  cm 
layer  of  the  Hn  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 
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Hn,  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-55.  Simulated  and  observed  cumulative  transpiration  for  the  2001  crop  season, 
Hn  soil  location,  using  IM  with  3  years  of  yield  data  and  no  neighborhood 
criteria.  The  black  line  corresponds  to  the  spatially-uncoupled  model  (scenario 
1 B);  the  grey  line  relates  to  the  coupled  model  (scenario  2B). 


GrB(Ba),  2001.  IM  with  3  yield  criteria, 
spatially-uncoupled  (black)  and  coupled  (grey)  models 
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Figure  7-56.  Simulated  and  observed  cumulative  transpiration  for  the  2001  crop  season, 
GrB(Ba)  soil  location,  using  IM  with  3  years  of  yield  data  and  no 
neighborhood  criteria.  The  black  line  corresponds  to  the  spatially-uncoupled 
model  (scenario  IB);  the  grey  line  relates  to  the  coupled  model  (scenario  2B) 
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Farmers,  crop  consultants,  and  soil  scientists  possess  valuable  knowledge  that  can 
be  harnessed  to  constrain  the  parameterization  of  spatial  crop  models,  increasing  the 
realism  of  the  simulations.  Crop  consultants  can  be  trained  to  manage  the  discussion 
process  that  generates  the  knowledge  representations  for  formalizing,  for  their  specific 
conditions,  neighborhood  criteria  such  as  the  ones  shown  in  Figures  7-21  to  7-23. 

A  problem  with  the  IM  framework  proposed  herein  is  that,  to  date,  it  lacks 
analytical  ways  to  estimate  parameter  uncertainty.  However,  the  high  speed  at  which  the 
IM  framework  can  evaluate  neighborhood  criteria  (as  opposed  to  running  a  crop  model) 
suggests  a  development  route  in  which  the  knowledge-based  elements  of  the  IM 
framework  are  separated  from  running  the  crop  model.  Using  a  Monte  Carlo  approach,  it 
may  be  possible  to  estimate  a  joint  a  priori  parameter  distribution  for  use  in  a  formal 
parameterization  setting  that  combines  crop  models  with  a  tool  such  as  an  ensemble 
Kalman  filter  (Bostick  et  al.,  2003;  Koo  et  al.,  2003).  We  see  this  as  an  important  avenue 
for  future  research. 

Michalewicz  and  Fogel  (2000)  stated  that  multi-objective  optimization  problems 
can  be  solved  either  by  simplifying  the  problem  so  traditional  methods  are  applicable,  or 
by  keeping  the  structure  of  the  problem  and  using  a  non-traditional  approach  for  its 
solution.  The  process  of  aggregating  the  multiple  criteria  into  a  unique  fitness  value 
implies  the  former  and,  subsequently,  changes  the  nature  of  the  original  optimization 
problem.  This  could  be  avoided  by  using  a  dominance-based  multi-objective  method  to 
produce  a  set  of  non-dominated  solutions  or  Pareto  surface  (Corne  et  al.,  2003). 
However,  at  this  stage  of  our  work,  the  value  that  a  Pareto  surface  may  have  for  a  crop 
consultant  is  unclear  to  us. 
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The  OWA  operator  is  a  flexible  tool  for  aggregating  multiple  sources  of  data.  It  is  a 
special  case  of  the  Choquet  integral  (Yager,  2002),  the  application  of  which  in  the  IM 
framework  would  allow  additional,  confidence-based  weighting  of  the  criteria.  This 
implies  that  a  priori  knowledge  regarding  confidence  in  a  knowledge  source  or  the 
results  of  a  particular  criterion  could  be  used  to  modify  their  influence  on  the  objective 
function  independently  of  the  OWA  weights.  This  is  a  topic  for  further  study. 

Conclusions 

In  this  study  we  introduced  a  novel  concept  for  parameterizing  spatial  biophysical 
models:  a  framework  for  eliciting  and  using  expert  knowledge  (from  domain  experts  such 
as  the  farmer,  crop  consultants,  and  an  NRCS  soil  scientist),  together  with  historical  yield 
data,  in  an  inverse  modeling  context. 

We  qualitatively  and  quantitatively  defined  a  model  of  spatial  relationships 
between  model  inputs  or  outputs  over  space  (spatial  parameter  model,  or  SPM),  and 
applied  it  to  a  representative  case  study  using  locally  available  sources  of  knowledge.  The 
SPM  consists  of  a  series  of  criteria,  each  of  which  constrains  a  different  model  input  or 
output,  that  are  aggregated  using  an  ordered-weighting  average  (OWA)  operator. 

We  found  that,  based  on  existing  data  such  as  RTK  elevation  maps, 
electroconductivity  maps,  soil  survey  maps,  and  simple  field  observations  made  with  a 
soil  probe,  it  was  possible  to  identify  appropriate  simulation  locations,  to  select 
parameters  to  estimate,  and  to  populate  the  different  criteria  that  comprise  the  SPM. 

We  applied  our  framework  (using  criteria  based  on  yield,  or  on  soil  depth)  to  the 
parameterization  by  inverse  modeling  (IM)  of  simple  spatially-uncoupled  and  coupled 
crop  models.  The  models'  ability  to  reproduce  observed  yield  was  good  with  both  two 
and  three  years  of  yield  data  used  for  parameter  estimation.  However,  their  ability  to 
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reproduce  observed  spatiotemporal  soil  water  patterns  was  landscape-position-dependent. 
There  were  several  spurious  compensations  between  parameters,  which  resulted  in 
parameter  combinations  that  best  simulated  observed  yields,  but  were  not  consistent  with 
soil  probe  observations  or  expert  opinions.  The  results  were  similar;  the  coupled  model 
did  not  produce  noteworthy  improvements  over  the  spatially-uncoupled  model,  although 
the  latter  clearly  compensated  the  lack  of  runon  by  means  of  spurious  parameter  value 
combinations.  These  artifacts  can  be  countered  using  tools  such  as  the  soil  map 
neighborhood  criteria  developed  herein. 

Numerous  opportunities  for  further  research  exist,  especially  in  developing  a 
method  that  can  provide  the  user  with  estimates  of  parameter  uncertainty,  which  our  IM 
framework  currently  lacks.  Separating  the  knowledge-based  elements  of  the  IM 
framework  from  running  the  crop  model,  together  with  Monte  Carlo  techniques,  may 
provide  such  an  opportunity.  Our  IM  framework  could  be  used  to  estimate  a  joint  a  priori 
parameter  distribution  for  use  in  a  formal  parameterization  setting  combining  crop 
models  with  a  tool  such  as  an  ensemble  Kalman  filter. 


CHAPTER  8 
CONCLUSIONS 

The  literature  to  date  on  the  inverse-modeling-based  parameterization  of  spatial 
crop  models  is  dominated  by  models  in  which  the  different  uncoupled  models.  In 
Chapter  2  we  studied  three  possible  sources  of  error  for  such  models:  error  from  biased 
weather  in  the  years  of  yield  data  used  for  the  parameterization  process,  errors  due  to  lack 
of  knowledge  of  initial  soil  water  conditions,  and  error  from  lack  of  spatial  coupling  and 
water  transport  among  different  landscape  locations. 

We  showed  analytical  proof  that  the  spatiotemporal  infiltration  behavior  of  a 
coupled  water  balance  model  cannot  be  reproduced  through  a  modification  of  the 
parameters  of  an  uncoupled  model.  The  corresponding  yield  prediction  limitations  of  the 
uncoupled  model  were  confirmed,  using  an  example,  both  at  the  parameter  estimation 
and  validation  stages. 

In  our  example,  however,  weather  biases  and  the  knowledge  of  initial  conditions 
greatly  impacted  the  predictive  capability  of  the  coupled  model,  and  had  less  effect  on  its 
uncoupled  counterpart. 

We  concluded  that  the  use  of  fully  coupled  spatial  crop  models  requires  high- 
quality  data.  Practical  precision  agriculture  applications  are  characterized  by  uncertain 
initial  conditions  and  the  possibility  of  biased  weather.  Under  these  circumstances,  the 
use  of  a  coupled  model  may  not  be  justified,  especially  for  low  landscape  positions. 

In  Chapter  3  we  explored  three  methods  for  solving  the  crop-scouting  problem  of 
concurrently  obtaining  an  optimal  spatial  sampling  scheme  for  a  phenomenon  of  interest 
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(e.g.  yield),  and  an  optimal  closed  scouting  path  that  links  the  locations  of  the  sampling 
scheme.  The  three  methods  belong  to  two  different  groups:  two  search  for  an  optimal 
sampling  scheme  and  then  link  the  sampling  locations  into  a  closed  scouting  path  by 
solving  the  Traveling  Salesman  Problem  (TSP).  The  remaining  method  solves  for  the 
sampling  locations  and  the  scouting  path  simultaneously,  using  a  modified  Kohonen  self- 
organizing  feature  map  (SOFM).  The  three  methods  provide  a  principled  approach  to  the 
design  of  crop-scouting  activities  as  a  form  of  spatial  sampling,  and  they  are  sufficiently 
quick  and  accurate  to  be  usable  in  practical  applications. 

The  TSP  methods  (MMKV+TSP  and  MMSD+TSP)  tended  to  make  slightly  shorter 
tours  than  the  SOFM,  although  the  three  methods'  tours  were  never  longer  than  the 
expert  opinions.  The  TSP  methods  also  typically  estimated  yield  slightly  better  than  the 
SOFM.  When  runtime  is  unconstrained  (and  a  semivariogram  is  available),  the 
MMKV+TSP  case  seems  most  appropriate.  Contrarily,  when  runtime  is  strongly 
constrained  MMSD+TSP  may  be  more  dependable.  In  intermediate  situations  the  three 
methods  are  practically  equivalent. 

In  Chapter  4  we  combined  the  scaled  semivariogram  technique  proposed  by  Vieira 
et  al.  (1991)  with  two  simulated  annealing  algorithms  to  reduce  the  number  of  locations 
necessary  to  describe  water  content  in  our  8-hectare  study  area  from  57  down  to  10 
points.  The  scaled  semivariogram  allowed  us  to  incorporate  data  from  several  dates,  both 
to  reflect  time-independent  behavior  of  water  content  and  to  compensate  for  the  relatively 
small  size  of  the  individual  datasets. 

Of  the  two  simulated  annealing  algorithms.  Spatial  Simulated  Annealing  (van 
Groenigen  and  Stein,  1998)  produced  more  consistent  results,  i.e.  greater  repeatability, 
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than  the  Sacks  and  Schiller  method  (Sacks  and  Schiller,  1988),  although  the  solutions 
provided  by  both  algorithms  were  quite  similar.  Running  multiple  instances  of  the 
optimization  process  is  recommended,  especially  if  using  the  Sacks  &  Schiller  method. 

Our  proposed  method  predicted  water  content  across  the  validation  set  with 
relatively  low  errors:  over  70%  of  all  the  predicted  water  contents  had  an  error  within 
±10%,  acceptable  for  the  application  it  was  designed  for.  The  method  also  captured  the 
spatial  variability  of  water  better  than  regular  grids  or  randomly  generated  patterns. 
However,  the  SMSE  (scaled  mean  squared  error)  -  based  scenarios  performed  better  than 
the  scenarios  using  an  SKV  (kriging  variance)  criterion. 

We  detected  temporal  stability  in  the  dataset.  This  phenomenon  implies  the 
existence  of  spatial  nonstationarity  of  the  water  content  across  the  field,  leading  to  the 
violation  of  kriging  assumptions  and  the  degradation  of  the  quality  of  kriging  estimates. 
However,  the  SMSE-based  optimization  scenarios  incorporated  temporally  stable 
extreme  (wet  and  dry)  points  into  the  optimal  subset,  using  them  to  capture  the 
nonstationary  behavior. 

Our  method  may  be  improved  by  combining  a  spatial  water  movement  model  with 
a  crop  simulation  model  to  provide  a  temporally  variable  trend  that  can  be  used  to 
eliminate  possible  spatial  nonstationarity. 

In  Chapter  5  we  showed  that  the  runtime  of  a  simulated  annealing  -  based  crop 
model  parameterization  process  was  greatly  reduced  through  the  reuse  of  simulation 
results  across  successive  iterations  of  the  algorithm  and  across  locations  within  an 
environment. 
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The  performance  of  the  modified  simulated  annealing  algorithm  we  used  was 
parameter  value  dependent.  However,  a  conservative  parameter  combination  was  found 
(etc  =  0.995,  och  =  1 .000,  cf  =  0.000 1 ,  and  m  >  5)  that  ran  much  faster  than  a  grid  search, 
its  runtime  tending  asymptotically  to  that  of  the  grid  search  as  the  number  of  locations  of 
interest  grew,  while  converging  to  objective  function  values  (and  the  corresponding 
parameter  combinations)  practically  identical  to  the  global  optima  determined  using  the 
grid  search  method. 

Adoption  of  the  proposed  algorithm  can  produce  runtime  reductions  on  the  order  of 
25%  -  75%,  depending  on  the  geometry  of  the  simulation  domain.  Additionally,  it  can  be 
used  to  parameterize  coupled  spatial  crop  models  in  which  parameter  values  at  one 
location  can  affect  parameter  values  at  other  locations,  a  task  not  possible  using  a  grid 
search.  Finally,  the  SA  algorithm  can  very  quickly  produce  approximate  answers  useful 
in  practical  applications. 

Chapter  6  dealt  with  Bayesian  networks:  probabilistic  tools  for  making  inferences 
with  different  sources  of  knowledge.  Bayesian  networks  provide  a  powerful  tool  for 
discussing  complex  concepts.  With  some  practice,  crop  consultants,  extension 
professionals,  and  clients  with  minimal  experience  using  personal  computers  can  easily 
understand  the  probabilistic  ideas  behind  Bayesian  networks,  as  well  as  the  use  of 
interactive  software  tools  for  their  construction.  Effective  model  definition  improves  with 
experience;  the  crop  consultants  /  extension  professionals  with  which  we  interacted 
rapidly  became  skilled  enough  to  define  and  populate  simple  models  in  1-2  hours. 
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Tools  for  Bayesian  network  modeling  are  readily  available  on  the  market.  For 
example,  a  powerful  trial  version  of  the  Netica  software  used  for  this  study  can  be 
downloaded  from  the  manufacturer's  website  (www.norsys.com). 

The  techniques  briefly  described  in  Chapter  6  are  not  limited  to  the  discussion  of 
agricultural  systems  management.  Any  extension  activity  that  requires  discussion  of 
cause-effect  relationships,  such  as  health  care,  safety,  and  mechanics-related  topics,  can 
benefit  from  Bayesian  network  -  supported  dialogue. 

In  Chapter  7  we  introduced  a  novel  concept  for  parameterizing  spatial  biophysical 
models:  a  framework  for  eliciting  and  using  expert  knowledge  (from  domain  experts  such 
as  farmers,  crop  consultants,  and  NRCS  soil  scientists),  together  with  historical  yield 
data,  in  an  inverse  modeling  context. 

We  qualitatively  and  quantitatively  defined  a  model  of  spatial  relationships 
between  model  inputs  or  outputs  over  space  (spatial  parameter  model,  or  SPM),  and 
applied  it  to  a  representative  case  study  using  locally  available  sources  of  knowledge.  The 
SPM  consists  of  a  series  of  criteria,  each  of  which  constrains  a  different  model  input  or 
output,  that  are  aggregated  using  an  ordered-weighting  average  (0 WA)  operator. 

We  found  that,  based  on  existing  data  such  as  RTK  elevation  maps, 
electroconductivity  maps,  soil  survey  maps,  and  simple  field  observations  made  with  a 
soil  probe,  it  was  possible  to  identify  appropriate  simulation  locations,  to  select 
parameters  to  estimate,  and  to  populate  the  different  criteria  that  comprise  the  SPM. 

We  applied  our  framework  (using  criteria  based  on  yield,  or  on  soil  depth)  to  the 
parameterization  by  inverse  modeling  (IM)  of  simple  spatially-uncoupled  and  coupled 
crop  models.  The  models'  ability  to  reproduce  observed  yield  was  good  with  both  two 
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and  three  years  of  yield  data  used  for  parameter  estimation.  However,  their  ability  to 
reproduce  observed  spatiotemporal  soil  water  patterns  was  landscape-position-dependent. 
There  were  several  spurious  compensations  between  parameters,  which  resulted  in 
parameter  combinations  that  best  simulated  observed  yields,  but  were  not  consistent  with 
soil  probe  observations  or  expert  opinions.  The  results  were  similar;  the  coupled  model 
did  not  produce  noteworthy  improvements  over  the  spatially-uncoupled  model,  although 
the  latter  clearly  compensated  the  lack  of  runon  by  means  of  spurious  parameter  value 
combinations.  These  artifacts  can  be  countered  using  tools  such  as  the  soil  map 
neighborhood  criteria  developed  herein. 

Numerous  opportunities  for  further  research  exist,  especially  in  developing  a 
method  that  can  provide  the  user  with  estimates  of  parameter  uncertainty,  which  our  IM 
framework  currently  lacks.  Separating  the  knowledge-based  elements  of  the  IM 
framework  from  running  the  crop  model,  together  with  Monte  Carlo  techniques,  may 
provide  such  an  opportunity.  Our  IM  framework  could  be  used  to  estimate  a  joint  a  priori 
parameter  distribution  for  use  in  a  formal  parameterization  setting  combining  crop 
models  with  a  tool  such  as  an  ensemble  Kalman  filter. 


APPENDIX  A 

THE  SIMULATED  ANNEALING  ALGORITHMS  USED  IN  CHAPTER  4 
Sacks  and  Schiller  Algorithm 

The  structure  of  the  fitness  function  is  J(S)  with  SeD,  where  D  represents  the 
spatial  domain  of  interest,  and  S  is  a  subset  of  D  containing  10  locations.  We  used  two 
different  fitness  functions,  SKV  and  SMSE,  described  in  the  text. 

The  Sacks  and  Schiller  generation  mechanism  and  cooling  schedule  are  as  follows: 

1 .  Create  initial  subset  S°  e  D  from  the  union  of  10  randomly  selected  locations  of  D. 
Set  /  =  0.7. 

2.  Initiate  the  (j+1  )th  iteration  of  the  process  by  randomly  selecting  a  tentative  entering 
location  t  from  (D-SJ). 

3.  Find  an  optimal  leaving  location  s*  g  Sj        so  that 

J(SJ  ut-s*)  =  minJ(SJ  ut-s). 

seSJ 


4.      Take  SJ+1 


SJut-s*  ifJ(SJut-s  )<J(SJ) 

SJ  ut-s*  with  probability  n>  if  J(SJ  ut-s*)>  J(SJ) 
SJ    with  probability  (l-7i J)     if  J(SJ  u  t -s* )  >  J(SJ) 


5.  If  Sj+1  =  Sj  in  step  4,  randomly  select  an  alternative  entering  location t  e  D-SJ  -t 
and  go  to  step  3  without  incrementing]  and  replacing  t  with  t' . 

6.  Repeat  step  5  up  to  L  times,  where  L  is  a  constant.  If  after  L  attempts  there  have 
been  no  changes,  assign  SJ+I  =SJ,  tcj+1  =  min(l,TtJ/(l-d'))and  go  to  step  2. 


>Jt,  J(l-d)7iJ  if  J(SJ+,)<(l-a)minJ(SK) 
rcJ  otherwise 


8.      Stop  if  there  have  been  M  iterations  without  changes  in  n' ,  where  M  is  a  constant. 
Otherwise  go  to  step  2. 
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Our  adopted  parameter  values  were  somewhat  more  conservative  than  the  ones 
proposed  by  Sacks  and  Schiller  (1988),  as  follows:  a  =  0.01, /  =  0.7,  d  =  0.3,  d'  =  0.2, 
L  =  57  -  10  =  47,  and  M  =  500.  These  values  cool  and  re-heat  the  system  more  slowly 
than  the  optimum  values  proposed  by  Sacks  and  Schiller. 

Spatial  Simulated  Annealing 

Acceptance  Criterion 

Let  SJ  and  S'  be  two  subsets  of  the  domain  D  such  that  S  is  generated  by 

perturbing  SJ .  Let  the  corresponding  fitness  function  values  be  J(Sj)  and  J(S  ) 

respectively.  The  acceptance  criterion  determines  whether  S  replaces  SJ  or  not.  The 
acceptance  probability  is  defined  as: 


:J+1 


1  if  J(S)<J(SJ) 


exp 


if  J(S)>J(SJ) 


where  c  is  a  positive  control  parameter  /  function  that  decreases  as  the  algorithm 
progresses. 

Generation  Mechanism 

The  way  a  new  pattern  S  is  generated  from  Sj  in  the  SSA  algorithm  is  by  shifting 
the  position  of  a  randomly  chosen  point  s  e  SJ  over  a  vector  h ,  with  the  direction  of  h 
being  chosen  randomly,  and  its  magnitude  |h|  set  by  generating  a  random  value  between 

0  and  a  function  hmax,  which  has  an  initial  value  greater  than  or  equal  to  the  length  of  the 
sampling  region,  and  decreases  with  time. 

The  modification  we  made  to  this  generation  mechanism  was  to  discretize  it  so 
only  locations  on  the  57-point  isometric  grid  could  be  selected  as  destinations  for  s.  We 
simplified  the  direction  selection  to  the  random  selection  of  a  quadrant  around  point  s, 
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then  randomly  picked  one  of  the  unused  microwatershed  sampling  points  located  within 
the  specified  quadrant  and  at  a  distance  less  than  or  equal  to  hmax  from  s. 
Cooling  Schedule 

The  cooling  schedule  changes  the  value  of  c  and  hmax  as  the  algorithm  progresses. 
The  functions  are: 

°J  =  ac  ■  cJ       ^  w-tk  t^e  vajues  Qf  ttc  ancj  ah  t»eing  oniy  slightly  less  than  1 . 

hmaxJ+l    =   ah  hmaxJ 

We  stopped  the  algorithm  if  M  iterations  passed  without  improvement  in  the  value 
of  J(S).  The  parameter  values  we  adopted  were:  c°  =  1,  ac  =  0.995,  ah  =  0.997,  M  =  2000. 


APPENDIX  B 

NEIGHBORHOOD  CRITERIA  DATA  AND  SOURCE  CODE 


Depth  Criterion 


The  soil  neighborhood  criteria  described  in  Chapter  7  are  encoded  as  matrices. 


Table  B-l  shows  how  the  soil  depth  criterion  is  encoded.  The  information  contained  in 


the  table  corresponds  to  Figure  7-21.  Only  the  first  three  columns  are  used-the  last  two 


(right-most)  columns  are  for  reference  purposes. 


Table  B- 1 .  Matrix  encoding  for  the  soil  neighborhood  criterion. 


Row 

Column 

Link  type,  Origin 

Destination 

strength 

soil  unit 

soil  unit 

1 

n/a 

PuB2 

PuB2 

I 

2 

n/a 

PuB2 

CaA(E) 

j 

3 

n/a 

PuB2 

CaA(W) 

4 

n/a 

PuB2 

GrA 

5 

n/a 

PuB2 

CaB 

6 

n/a 

PuB2 

Hn 

7 

n/a 

PuB2 

CaA(N) 

8 

« 

PuB2 

LoB 

9 

n/a 

PuB2 

LoC2 

10 

n/a 

PuB2 

GrB(N) 

1  1 

n/a 

PuB2 

CaA(S) 

12 

n/a 

PuB2 

GrB 

13 

n/a 

PuB2 

GrB(Ba) 

2 

1 

n/a 

CaA(E) 

PuB2 

2 

2 

n/a 

CaA(E) 

CaA(E) 

2 

3 

n/a 

CaA(E) 

CaA(W) 

2 

4 

<= 

CaA(E) 

GrA 

2 

5 

< 

CaA(E) 

CaB 

2 

6 

< 

CaA(E) 

Hn 

2 

7 

n/a 

CaA(E) 

CaA(N) 

2 

<S 

n/a 

CaA(E) 

LoB 

2 

9 

n/a 

CaA(E) 

LoC2 

2 

10 

n/a 

CaA(E) 

GrB(N) 

2 

1  1 

< 

CaA(E) 

CaA(S) 

2 

12 

<= 

CaA(E) 

GrB 

2 

13 

n/a 

CaA(E) 

GrB(Ba) 

3 

1 

n/a 

CaA(W) 

PuB2 

3 

2 

n/a 

CaA(W) 

CaA(E) 
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Table  B-l.  Continued. 


Row 

Column 

Link  type,  Origin 

Destinati 

strength 

soil  unit 

soil  unit 

3 

3 

n/a 

CaA(W) 

CaA(W) 

3 

4 

n/a 

CaA(W) 

GrA 

3 

5 

n/a 

CaA(W) 

CaB 

3 

6 

n/a 

CaA(W) 

Hn 

3 

7 

n/a 

CaA(W) 

CaA(N) 

3 

8 

>= 

CaA(W) 

LoB 

3 

9 

n/a 

CaA(W) 

LoC2 

3 

10 

n/a 

CaA(W) 

GrB(N) 

3 

1 1 

n/a 

CaA(W) 

CaA(S) 

3 

12 

>= 

CaA(W) 

GrB 

3 

13 

< 

CaA(W) 

GrB(Ba) 

4 

1 

n/a 

GrA 

PuB2 

4 

2 

>= 

GrA 

CaA(E) 

4 

3 

n/a 

GrA 

CaA(W) 

4 

4 

n/a 

GrA 

GrA 

4 

5 

n/a 

GrA 

CaB 

4 

6 

n/a 

GrA 

Hn 

4 

7 

n/a 

GrA 

CaA(N) 

4 

8 

n/a 

GrA 

LoB 

4 

9 

n/a 

GrA 

LoC2 

4 

10 

n/a 

GrA 

GrB(N) 

4 

1  1 

GrA 

CaA(S) 

4 

12 

n/a 

GrA 

GrB 

4 

13 

n/a 

GrA 

GrB(Ba) 

5 

1 

n/a 

CaB 

PuB2 

5 

2 

> 

CaB 

CaA(E) 

5 

3 

n/a 

CaB 

CaA(W) 

5 

4 

n/a 

CaB 

GrA 

5 

5 

n/a 

CaB 

CaB 

5 

6 

n/a 

CaB 

Hn 

5 

7 

CaB 

CaA(N) 

5 

8 

N/a 

CaB 

LoB 

5 

9 

N/a 

CaB 

LoC2 

5 

10 

> 

CaB 

GrB(N) 

5 

1  1 

N/a 

CaB 

CaA(S) 

5 

12 

N/a 

CaB 

GrB 

5 

13 

N/a 

CaB 

GrB(Ba) 

6 

1 

N/a 

Hn 

PuB2 

6 

2 

> 

Hn 

CaA(E) 

6 

3 

N/a 

Hn 

CaA(W) 

6 

4 

N/a 

Hn 

GrA 

6 

5 

N/a 

Hn 

CaB 

6 

6 

N/a 

Hn 

Hn 

6 

7 

N/a 

Hn 

CaA(N) 

6 

8 

N/a 

Hn 

LoB 

6 

9 

N/a 

Hn 

LoC2 

6 

10 

N/a 

Hn 

GrB(N) 
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Table  B-l.  Continued. 

Row 

Column 

Link  type,  Origin 

Destination 

strength 

soil  unit 

soil  unit 

6 

1 1 

Hn 

CaA(S) 

6 

12 

> 

Hn 

GrB 

6 

13 

n/a 

Hn 

GrB(Ba) 

7 

1 

n/a 

11/  Cl 

CaACN) 

PuB2 

7 

2 

n/a 

1 1/  a 

CaAfNH 

CaA(E) 

7 

3 

n/a 

CaA(N) 

CaA(W) 

7 

4 

n/a 

CaA(N) 

GrA 

7 

5 

CaA(N) 

CaB 

7 

6 

n/a 

CaA(N) 

Hn 

7 

7 

n/a 

11/  c* 

CaAfN) 

CaA(N) 

7 

8 

n/a 

CaA(N) 

LoB 

7 

9 

n/a 

11/  a 

CaAfN") 

LoC2 

7 

10 

> 

GrB(N) 

7 

1 1 

n/a 

CaA(S) 

7 

12 

> 

GrB 

7 

13 

n/a 

CaAfN") 

GrB(Ba) 

1 

» 

LoB 

PuB2 

8 

2 

n/a 

1 1/  <X 

LoB 

CaA(E) 

8 

3 

<= 

LoB 

CaA(W) 

8 

4 

n/a 

LoB 

GrA 

8 

5 

n/a 

i  ii  a 

LoB 

CaB 

8 

6 

n/a 

LoB 

Hn 

8 

7 

n/a 

I  oB 

CaA(N) 

8 

8 

n/a 

LoB 

LoB 

8 

9 

>= 

LoB 

LoC2 

8 

10 

n/a 

LoB 

GrB(N) 

8 

1  l 

n/a 

I  oB 

CaA(S) 

8 

12 

LoB 

GrB 

8 

13 

« 

LoB 

GrB(Ba) 

9 

1 

n/a 

I  oC2 

PuB2 

9 

2 

n  /a 
1 1/  a 

LoC2 

CaA(E) 

9 

3 

n/a 

LoC2 

CaA(W) 

9 

4 

ii/  a 

LoC2 

GrA 

9 

5 

11/ a 

CaB 

9 

6 

11/  a 

Hn 

9 

7 

n/a 

LoC2 

CaACN) 

9 

8 

<= 

LoC2 

LoB 

9 

9 

n/a 

I  oC2 

LoC2 

9 

10 

1 1  a 

GrB(N) 

9 

1  1 

n  /a 

i  or? 

CaA(S) 

9 

12 

n/a 

LoC2 

GrB 

9 

13 

n/a 

LoC2 

GrB(Ba) 

10 

1 

n/a 

GrB(N) 

PuB2 

10 

2 

n/a 

GrB(N) 

CaA(E) 

10 

3 

n/a 

GrB(N) 

CaA(W) 

10 

4 

n/a 

GrB(N) 

GrA 

10 

5 

< 

GrB(N) 

CaB 
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Table  B-l.  Continued. 


Row 

Column 

Link  type,  Origin 
strength     soil  unit 

Destination 
soil  unit 

10 

6 

N/a 

GrB(N) 

Hn 

10 

7 

< 

GrB(N) 

CaA(N) 

10 

8 

N/a 

GrB(N) 

LoB 

10 

9 

N/a 

GrB(N) 

LoC2 

10 

10 

n/a 

GrB(N) 

GrB(N) 

10 

1 1 

n/a 

GrB(N) 

CaA(S) 

10 

12 

GrB(N) 

GrB 

10 

13 

n/a 

GrB(N) 

GrB(Ba) 

1 1 

1 

n/a 

CaA(S) 

PuB2 

1 1 

2 

> 

CaA(S) 

CaA(E) 

1 1 

3 

n/a 

CaA(S) 

CaA(W) 

1 1 

4 

CaA(S) 

GrA 

1 1 

5 

n/a 

CaA(S) 

CaB 

1 1 

6 

CaA(S) 

Hn 

1 1 

7 

n/a 

CaA(S) 

CaA(N) 

1 1 

8 

n/a 

CaA(S) 

LoB 

1 1 

9 

n/a 

CaA(S) 

LoC2 

1 1 

10 

n/a 

CaA(S) 

GrB(N) 

1 1 

1 1 

n/a 

CaA(S) 

CaA(S) 

1 1 

12 

> 

CaA(S) 

GrB 

1 1 

13 

n/a 

GrB(Ba) 

12 

1 

n/a 
11/  a 

GrB 

PuB2 

12 

2 

>= 

GrB 

CaA(E) 

12 

3 

<= 

GrB 

CaA(W) 

12 

4 

n/a 
11/  a 

GrB 

GrA 

12 

5 

n/a 

GrB 

CaB 

12 

6 

< 

GrB 

Hn 

12 

7 

< 

GrB 

CaA(N) 

12 

8 

GrB 

LoB 

12 

9 

n/a 

GrB 

LoC2 

12 

10 

GrB 

GrB(N) 

12 

1  1 

< 

GrB 

CaA(S) 

12 

12 

n/a 

GrB 

GrB 

12 

13 

« 

GrB 

GrB(Ba) 

13 

1 

n/a 

11/  Cl 

GrBfBa^ 

PuB2 

13 

2 

n/a 

11/  a 

GrB(Ba) 

CaA(E) 

13 

3 

> 

GrBfBa) 

VI  1   1  ' \   I'll  / 

CaA(W) 

13 

4 

11/  a 

TirBfBa^ 

GrA 

13 

5 

1 1/  Cl 

CaB 

13 

6 

n/a 

GrB(Ba) 

Hn 

13 

7 

n/a 

GrB(Ba) 

CaA(N) 

13 

8 

» 

GrB(Ba) 

LoB 

13 

9 

n/a 

GrB(Ba) 

LoC2 

13 

10 

n/a 

GrB(Ba) 

GrB(N) 

13 

1  1 

n/a 

GrB(Ba) 

CaA(S) 

13 

12 

» 

GrB(Ba) 

GrB 

13 

13 

n/a 

GrB(Ba) 

GrB(Ba) 
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Source  Code  for  Soil  Map  Neighborhood  and  Yield  History  Criteria,  OWA 

Operator 


The  source  code  for  evaluation  of  the  IM  framework  criteria  used  in  this  study,  as 


well  as  the  OWA  operator  was  written  as  a  unit  in  Borland  Pascal  7.0  (Borland,  1992).  It 


is  listed  below. 


Unit  Criteria; 

(*  Data  structure:  each  constraint  table  has  to  have  a  list  of  its  non-nil  *) 
(*  relationships,  so  we  evaluate  only  what  we  need.  *) 

(*  There  is  a  boolean  constant,  cUselipperOnly,  which  can  be  used  to  speed  things  up  by 
using  a  triangular  (upper  half)  matrix.  *) 

C*  Taking  only  those  entries  in  which  j  >  i  will  use  the  upper  half. 

INTERFACE 

const 


cUselipperOnly    =  true;    (*  Used  to  speed  things  up  assuming  symmetric  constraints  *) 
cNoData:    Real  =  -99.9;  (*  Used  to  communicate  calculation  errors  *) 
cAlmostZero       =  le-6;    (*  Used  to  avoid  errors  and  implement  some  criteria  *) 

cMaxTableEntries  =  1024;  (*  used  to  limit  size  of  data  structures  *) 

cMaxNodes             =  32;  (*  used  to  limit  size  of  data  structures  *) 

cMaxCalibCols       =  5; 

cUseMSE                =  0;  C*  Tells  yield  criterion  to  use  RMSE  *) 

cUseExponential    =  1;  (*  Tells  yield  criterion  to  use  exponential  *) 

eNumNodesMismatch  =  1000;  (*  An  error  condition  *) 

cNoRelationship  =  0; 

cStronglyGreater  =  1; 

cGreater  =  2; 

cWeakl yGreater  =  3; 

cStronglyLesser  =  4; 

cLesser  =  5; 

cWeakl yLesser  =  6; 

cstronglysimilar  =  7; 

cSimilar  =  8; 

cWeakl  ysi  mi  lar  =  9; 

cError  =  10; 

cMaxYieldCols        =  5; 

cMaxCriteria  =  20; 


type 

plDVector     =  AtiDVector; 

tIDVector     =  Array[l. .cMaxYieldCol s]  of  integer; 

pDatavector  =  Atuatavector; 

tDatavector  =  Array[l. .cMaxNodes]  of  Real; 

pCalibYieldDatavector  =  AtcalibYieldDataVector; 

tCalibYieldDatavector  =  Array [1. . cMaxNodes , 1. . cMaxcal i bcol s]  of  Real; 
tcriteriavector  =  Array[l.  .cMaxCriteria]  of  Real; 


(*    *j 
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pNeighborhoodCriterion  =  AtNeighborhoodCriterion; 
tNeighborhoodCn'ten'on  =  object 


ArcTai  1  s : 
ArcHeads : 
ArcRelationships 
NumArcs: 
Error:  Integer; 

GT_e_Strong:  Real; 
GT_e_Medium:  Real; 
GT_e_weak:  Real; 

LT_e_Strong:  Real; 
LT_e_Medi um:  Real; 
LT_e_Weak:  Real; 

EQ_e_Strong:  Real; 

EQ_e_Medium:  Real; 

EQ_e_Weak:  Real; 

EQ_d_Strong:  Real; 

EQ_d_Medium:  Real; 

EQ_d_Weak:  Real; 


Array[l. .cMaxTableEntries 
Array[l.  .CMaxTableEntries 
Array[l.  .CMaxTableEntries 
word; 


of  word; 
of  word; 
of  integer; 


Range: 
Constructor 


Destructor 
Function 
Function 
end; 


Real ; 

Init(Filename:  OpenString; 

iGT_e_Strong,  iGT_e_Medi um,  iGT_e_weak: 
i LT_e_Strong ,  i LT_e_Medi um,  iLT_e_weak: 
i EQ_e_Strong ,  iEQ_e_Medium,  iEQ_e_weak, 
iEQ_d_Strong,  iEQ_d_Medium,  iEQ_d_weak: 
i Range:  Real); 

Done ; 

EvaluateArc(RelDiff :  Double;  Code:  Integer): 
Evaluate(TheData:  pDataVector) :  Real; 


Real ; 
Real ; 

Real ; 
Doubl e ; 


(*   

pYieldCriterion  =  AtYieldCriterion; 
tYieldCriterion  =  object 


*) 


Error: 

NumNodes : 
Calibcols: 

k: 

Method: 
ObsData: 
variance: 
YearMean : 

Constructor 
Destructor 
Function 
Function 
word):  Word; 

end ; 

(*  

tOWA  =  object 

OutFile: 
Numcriteria: 
Error: 
Weights: 
val ues : 


integer; 
Word ; 

word;  (*  Number  of  yield  years  used  for  calibration  *) 

Real ; 
word ; 

teal i  bYi  el dDataVector ; 

Array [1.  .cMaxCalibCols]  of  Real; 

Array [1.  .CMaxCalibCols]  of  Real; 

lnit(Filename:  OpenString;  NumYieldCols,  iMethod:  Word;  TheK:  Real); 
Done ; 

Evaluate (Si mData:  pCal i bYi el dDataVector) :  Real; 

Coi  nci  denceCode(Si mData :  peal i  bYi  el dDataVector ;  Locati  on , NumYears : 


0 


Reportlnterval ,  Counter: 


(*  User  is  responsible  for  populating  the  vectors  *) 

text; 
word ; 
Integer; 

tCriteriavector; 
tCriteriaVector; 


Longlnt; 


Rank2Acc: 
OWAAcc : 
Val ueAcc : 

Constructor 
Destructor 
Function 
end; 


Array [1.  .cMaxCriteria]  of  Double; 
Double; 

Array[l.  .cMaxCriteria]  of  Double; 

lnit(iNumCriteria, iter:  word;  i Reportlnterval 
Done; 

Evaluate:  Real ; 


Longlnt) ; 
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(* 


IMPLEMENTATION 

function  AlmostEqual (x,y :  real):  Boolean; 
begin 

if  (abs(x-y)  <  cAlmostZero)  then 

AlmostEqual  :=  true 
el  se 

AlmostEqual  :=  false; 

end ; 

(*   


Constructor  tNeighborhoodCriter ion. in it (Filename:  OpenString; 

iGT_e_Strong,  iGT_e_Medium,  iGT_e_Weak:  Real; 
i LT_e_Strong ,  i LT_e_Medi urn,  iLT_e_Weak:  Real; 
i EQ_e_Strong ,  iEQ_e_Medium,  iEQ_e_weak, 
i EQ_d_Stronq ,  iEQ_d_Medium,  iEQ_d_weak:  Real; 
i Range:  Real); 


var  f:  text; 

Tail, Head,  NodeNum: 

RelStr:  String; 

Relstr2:  String [4]; 

Relationship:  Integer; 

i:  Integer; 
begin 


Word ; 


GT_e_Strong 

GT_e_Medi  urn 

GT_e_weak 

LT_e_Strong 

LT_e_Medium 

LT_e_weak 

EQ_e_Strong 

EQ_e_Medi  urn 

EQ_e_weak 

EQ_d_Strong 

EQ_d_Medium 

EQ_d_weak 

Range 


iGT_e_Strong ; 
iGT_e_Medi  urn; 
i  GT_e_weak ; 
i  LT_e_Strong ; 
iLT_e_Medium; 
iLT_e_weak; 
i  EQ_e_Strong; 
i  EQ_e_Medi  urn; 
i  EQ_e_weak ; 
iEQ_d_Strong; 
iEQ_d_Medium; 
i  EQ_d_weak ; 
i Range; 


NumArcs  :=  0; 
Assign(f , FileName) ; 
Reset(f) ; 

error  :=  lOResult; 

if  (error  =  0)  then 
begi  n 

Readln(f , NodeNum); 
error  :=  lOResult; 

while  (error  =  0)  and  (not  eof(f))  do 
begi  n 

readln(f  .Tail ,  Head,  RelStr); 
error  :=  lOResult; 


if  (error 
begi  n 


0)  then 


(*  ---  Clean  RelStr 


*) 


while  (Length(RelStr)  >  1)    and  ((Relstr[l]  = 
RelStr  :=  Copy(Rel Str , 2 , Length(Rel Str)-1) ; 


')  or  (RelStr[l]  =  #9))  do 


length(RelStr))  and  (i  >  0)  do 

')  and  (RelStr[i]  <>  #9)  then 


RelStr2  :  = 
i  :=  1; 
while  (i  < 
begi  n 

if  (RelStr[i]  <> 

begi  n 

RelStr2  :=  RelStr2  +  upcase(RelStr[i]) ; 

inc(i) ; 
end 
el  se 

i  :=  -1; 

end; 
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if  (Relstr2  = 

if  (RelStr2  = 

if  (RelStr2  = 

if  (RelStr2  = 

if  (RelStr2  = 

if  (Relstr2  = 

if  (Re1str2  = 

if  (RelStr2  = 

if  (RelStr2  = 

if  (Relstr2  = 
Relationship 


N/A')  then 


»') 
>') 
>=') 
«') 
<') 
<=') 
==') 
=  ') 


then 
then 
then 
then 
then 
then 
then 
then 
then 


cError; 


Rel ationshi  p 
Relationship 
Relationship 
Relationship 
Relationship 
Rel ationshi  p 
Relationship 
Relationship 
Relationship 
Relationship 


cNoRelationship  else 
cStronglyGreater  else 
cGreater  else 
cweakl yGreater  else 
cStronglyLesser  else 
cLesser  else 
cweaklyLesser  else 
cstronglysimilar  else 
csimilar  else 
cweaklysimilar  else 


if  (Tail  >=  1)  and  (Tail  <=  NodeNum)  and 

(Head  >=  1)  and  (Head  <=  NodeNum)  and  (Relationship  <>  cError)  then 
begi  n 

if  cuseupperonly  and  (Tail  >=  Head)  or 

(Relationship  =  cNoRelationship)  then 
begin 

end 
else 
begi  n 

i  nc(NumArcs) ; 

ArcTail s [NumArcs] 

ArcHeads [NumArcs] 

ArcRelationships  [NumArcs] 
end ; 
end 
el  se 

error  :=  eNumNodesMismatch; 

end ; 
end; 

Close(f); 
end; 
end; 


Tail ; 
Head; 

Relationship; 


Destructor  tNeighborhoodCriterion.Done; 
begin 

end; 

(*   

Function  RelDiff(a,b, Range:  Real):  Real; 
begin 

if  (abs(Range)  <  cAlmostZero)  then 

RelDiff  :=  cNoData 
else 

RelDiff  :=  (a  -  b)  /  Range; 


end ; 

(*  -■ 


Function  tNeighborhoodCriterion.EvaluateArc(RelDiff :  Double;  Code:  Integer):  Double; 

var  x,y,  xl,x2,x3,x4,x5,x6,  yl,y2,y3,y4,y5,y6:  Double; 

begin 


case  Code  of 

cStronglyGreater:  begin 


xl 

=  -100; 

yi 

l; 

x2 

=  -50; 

y2 

i; 

x3 

=  -GT_ 

y3 

1; 

x4 

GT 

y4 

0; 

x5 

50; 

ys 

0; 

x6 

=  100; 

0; 

end ; 

cGreater:  begin 


xl 

=  -100; 

yi 

l; 

x2 

=  -50; 

= 

x3 

=  -die 

y3 

i; 

x4 

GT_e 

y4 

0; 

x5 

50; 

y5 

0; 

x6 

=  100; 

y6 

0; 

end ; 

cweakl yGreater 

xl 

=  -100; 

yi 

i; 

x2 

=  -50; 

y2 

1; 

x3 

=  -GT_e 

y3 

1; 

x4 

GT_e 

y4 

0; 

x5 

=  50; 

ys 

0; 

x6 

=  100; 

y6 

0; 

end; 

cStronglyLesse 

xl 

=  -100; 

yi 

0; 

x2 

=  -50; 

y2 

0; 

x3 

=  -LT_e 

y3 

0; 

x4 

LT_e 

y4 

1; 

x5 

=  50; 

ys 

l; 

x6 

=  100; 

y6 

l; 

end ; 

cLesser:  begin 

xl 

=  -100; 

yi 

0; 

x2 

=  -50; 

y2 

0; 

x3 

=  -LT_e 

y3 

0; 

x4 

LT_e 

y4 

1; 

x5 

=  50; 

ys 

l; 

x6 

=  100; 

l; 

end; 

cweakl yLesser: 

xl 

=  -100; 

yi 

0; 

x2 

=  -50; 

y2 

0; 

x3 

=  -LT_e 

y3 

0; 

x4 

LT_e. 

y4 

1; 

x5 

=  50; 

ys 

i; 

x6 

=  100; 

y6 

1; 

end; 

e_Medium  +  cAlmostZero 


begin 


Ji/eak ; 

_weak  +  cAlmostzero; 


begin 


cAl mostZero 


.Medium; 


begin 


_weak  -  CAlmostzero; 


_weak; 


cstronglysimilar:  begin 


xl 

yi 

x2 

y2 


-100; 

i; 

-EQ_e_Strong  -  cAlmostzero 
1; 
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x3 

y3 

x4 
y4 
x5 

y5 

x6 
end; 


-EQ_d_Strong; 
0; 

EQ_d_Strong; 
0; 

EQ_e_Strong  +  cAlmostZero; 

i; 

100; 

i; 


csimilar:  begin 


xl 

yi 

x2 

y2 

x3 

y3 

x4 
y4 
x5 

y5 

x6 
y6 
end; 


=  -100; 

i; 

-EQ_e_Medium; 
1; 

-EQ_d_Medi  um; 
0; 

EQ_d_Medium; 
0; 

EQ_e_Medium; 
1; 
100; 

l; 


cWeaklySimilar:  begin 


xl 

yi 

x2 

y2 

x3 

y3 

x4 
y4 
x5 

ys 

x6 
y6 
end ; 


-100; 
1; 

-EQ_e_Weak; 

l; 

-EO_d_weak; 
0; 

EQ_d_weak; 
0; 

EQ_e_weak; 
1; 
100; 

i; 


end; 

x  :=  RelDiff; 


(* 


interpolate  *) 


if  (x  <=  x2)  then 

y  :=  yl  +  (x  -  xl)  *  (y2  -  yl)  /  (x2  -  xl) 
el  se 

if  (x  <=  x3)  then 

y  :=  y2  +  (x  -  x2)  *  (y3  -  y2)  /  (x3  -  x2) 
el  se 

if  (x  <=  x4)  then 

y  :=  y3  +  (x  -  x3)  *  (y4  -  y3)  /  (x4  -  x3) 
el  se 

if  (x  <=  x5)  then 

y  :=  y4  +  (x  -  x4)  *  (y5  -  y4)  /  (x5  -  x4) 
el  se 

y  :=  y5  +  (x  -  x5)  *  (y6  -  y5)  /  (x6  -  X5); 
EvaluateArc  :=  y; 
end; 

(*   


'0 


Function  tNeighborhoodCriterion.Evaluate(TheData:  pDatavector) :  Real 

var  i , j :  word; 
Code:  Integer; 

ConstraintValue.Accum:  Double; 
Tail  Data,  HeadData:  real; 
TheRelDiff:  Real; 


begi  n 
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Accum  :=  0; 

if  (NumArcs  <>  0)  then 
begin 

for  i  :=  1  to  NumArcs  do 
begin 


Code 

Tail  Data 
HeadData 


ArcRelationships[i] ; 
TheDataA  [ArcTai  1  s  [i  1  ] ; 
TheDataA [ArcHeads [i ] ] ; 


if  (Code  <>  cNoRelationship)  then 
begin 

TheRelDiff  :=  RelDiff (Tail Data, HeadData, Range) ; 

if  (TheRelDiff  <>  cNoData)  then 
Constraintvalue  :=  Eval uateArc(TheRelDi ff , Code) ; 

Accum  :=  Accum  +  Constraintvalue; 

end; 
end; 

Accum  :=  Accum  /  NumArcs 

end 
el  se 

Accum  :=  cNoData; 
Evaluate  :=  Accum; 
end; 

(*    *) 

Constructor  tYieldCriterion.lnit(Filename:  Openstring;  NumYieldCols,  iMethod:  Word;  TheK 
Real) ; 

var  f:  Text; 
i,j:  word; 
ID:  word; 
s:  String; 

Counter:  Array[l. . cMaxCal i bCol s]  of  Longlnt; 

function  Minl(x,y:  Integer):  Integer; 
begi  n 

if  (x  <=  y)  then 

Mini  :=  x 
el  se 

Mini  :=  y; 

end; 


begin 

NumNodes 
Calibcols 
Method 
k 


=  0; 
=  0; 

=  iMethod; 
=  TheK; 


for  i  :=  1  to  cMaxCalibCols  do 
begin 


Variance  [i] 
YearMean  [i ] 
Counter  [i] 
end; 

c*  


Assign(f , FileName) ; 
Reset(f) ; 

error  :=  lOResult; 

if  (error  =  0)  then 
begin 

Readln(f .NumNodes, Calibcols) ;  (*  read  number  of  nodes,  calibration  columns  *) 
error  :=  lOResult; 

Readln(f.s);  (*  Get  column  name  header  *) 
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error  :=  lOResult; 

if  (error  =  0)  then    (*  Get  column  IDs  for  the  calibration  years  *) 
begi  n 

Cal i  bcol s  : =  Mi  ni (Mi  ni (Cal i  bCol  s , cMaxCal i  bcol s) , NumYi  el dcol s) ; 
i  :=  1; 

if  (Cal i bCol s  <>  0)  then 

while  (not  eof(f))  and  (error  =  0)  and  (i  <=  NumNodes)  do 
begi  n 

read(f ,ID); 

if  (ID  <>  i)  then 

wri tel n ( ' warning !  Observed  yield  data  file  has  incorrect  ID  ordering!'); 

for  j  :=  1  to  Cal i bcol s  do 
begin 

if  (j  =  Cal i bCol s)  then 
Readl n(f  ,ObsData[i ,  j]) 
el  se 

Read (f , ObsData [i , j ] ) ; 

if  not  AlmostEqual (ObsData [i , j] ,  cNoData)  then 
begin 

variance[q]  :=  Variance[j]  +  sqr(ObsData[i ,  j])  ; 
YearMean[j]  :=  YearMean[j]  +  ObsDatafi,]]; 
inc(Counter[j]) ; 
end; 
end; 

inc(i); 
end ; 
end; 

(*    *) 

Close(f); 
end; 

if  (CalibCols  <>  0)  and  (NumNodes  <>  0)  then 
begi  n 

for  j  :=  1  to  CalibCols  do 
begi  n 

VarianceM]  :=  (variance[j]  -  Counter[j]  *  sqr(YearMean[j]/Counter[j]))  / 
(Counter [j]  -  1); 
end; 
end; 
end; 

Destructor  tYieldCriterion.Done; 
begin 

end; 

Function  tYieldCriterion.Evaluate(SimData:  pCalibYieldDataVector) :  Real; 
var  Counter:  Longint; 

i,j:  word; 

ErrorFunction :  Double; 
begi  n 

ErrorFunction  :=  0; 
Counter  :=  0; 

for  i  :=  1  to  NumNodes  do 
begi  n 

for  j  :=  1  to  CalibCols  do 
begi  n 

if  ((not  AlmostEqual (ObsData [i ,  j] |, cNoData))  and 

(not  AlmostEqual (SimDataA[i ,j] , cNoData)))  then 
begi  n 

ErrorFunction  :=  ErrorFunction  +  sqr(SimDataA [i ,  j]  -  ObsDatafi ,  j])  /  variance[j]; 
inc(Counter) ; 
end ; 
end ; 
end; 

ErrorFunction  :=  ErrorFunction  /  Counter; 

if  (Method  =  cUseMSE)  then 
begin 
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if  (ErrorFunction  >  1)  then    (*  COO!  *) 
ErrorFunction  :=  1; 

} 

end 
el  se 
begi  n 

ErrorFunction  :=  1  -  exp(-k  *  ErrorFunction); 
end; 

Evaluate  :=  ErrorFunction; 
end ; 

Functi  on  tYi  el dCri  terion .Coi  nci  denceCode(Si  mData:  pCal i  bYi  el dDatavector ; 
Location.NumYears:  word):  word; 
var  i :  Word; 

TheMask:  Word; 

const  Powers2:  Array[1..8]  of  word  =  (1,2,4,8,16,32,64,128); 
begin 
TheMask  :=  0; 

for  i  :=  1  to  NumYears  do 
begin 

if  AlmostEqual (Si mDataA [Location, i] ,  ObsData [Location , i ] )  then 
TheMask  :=  TheMask  +  Powers2[ij; 

end ; 

Coincidencecode  :=  TheMask; 
end ; 

(*    *) 

Constructor  tOWA.lnit(iNumCriteria,  Iter:  Word;  i Reportlnterval :  Longint); 
var  i :  word; 

s:  String[12]; 
begin 

Str(lter,s) ; 

while  (Length(s)  <  4)  do 
s  :=  '0'  +  s; 

s  :=  'OWA_'  +  s  +  ' .csv' ; 

Assign(OutFile,s) ; 
Rewnte(OutFile) ; 

error    :=  IOResult; 
OWAAcc  :=  0; 

Numcriteria  :=  iNumCriteria; 

for  i  :=  1  to  cMaxCriteria  do 
begi  n 

values[i]  :=  0; 

weights[i]  :=  0; 

Rank2Acc[i]  :=  0; 

ValueAcc[i]  :=  0; 
end ; 

Reportlnterval  :=  i Reportlnterval ; 
Counter  :=  0; 

if  (Error  =  0)  then 
begin 
wri te(OutFi le, '  n ' )  ; 

for  i  :=  1  to  Numcriteria  do 
write(Outfile, ' ,C ,i) ; 

for  i   :=  1  to  Numcriteria  do 
write(Outfile, '  ,R2_'  ,i); 

writeln(OutFile, '  ,OWA') ; 
end; 
end; 
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c.  *) 

Destructor  tOWA.Done; 
begi  n 

if  (Error  =  0)  then 
Close(OutFile) ; 

if  (IOResult  =  0)  then  ; 
end; 

(*    *) 

Function  tOWA. Evaluate;  (*  We  assume  the  values  and  weights  have  been  populated  *) 
var  i , j :  Word; 

AuxR,AuxR2:  Real ; 

AuxW:  Word; 

Rankvectorl,Rankvector2:  Array[l. .cMaxCriteria]  of  word; 
other:  tcriteriavector; 

(* 

Rankvector  1  says  which  of  the  incoming  criteria  is  in  each  position  of  the 
sorted  vector.  [3  2  14]  means  that  the  third  criterion  has  the  highest  value, 
the  fourth  criterion  has  the  lowest,  etc] 

Rankvector  2  is  perhaps  more  useful.  It  shows  what  position  each  incoming  criterion 
ended  up  in.  [3  2  14]  says  that  the  first  criterion  was  the  3rd  largest, 
the  second  criterion  was  the  second  largest,  the  third  criterion  was  the 
largest,  etc. 

*) 

begi  n 

(*    *) 

inc(Counter) ; 

for  i  :=  1  to  NumCriteria  do 
begi  n 

Rankvectorl[i]  :=  i;  (*  Init  *) 

ValueAcc[i]  :=  ValueAcc[i]  +  Values[i];  (*  Add  criteria  values  to  Acc  *) 
end; 

(*  First  bubble-sort  the  data  *) 

if  (NumCriteria  >  1)  then 

for  i  :=  1  to  NumCriteria-1  do 
begin 

for  j  :=  1  to  NumCriteria  -  i  do 
begi  n 

if  (values[j]  <  values[j+l])  then 
begi  n 

AuxR  :=  ValuesM] ; 

values[j]  :=  values [j+1] ; 

Values[j+1]  :=  AuxR; 

Auxw  :=  Rankvectorl[j] ; 
RankVectorl[j]  :=  RankVectorl[ j+1] ; 
RankVectorl[]+l]  :=  Auxw; 
end ; 
end; 
end; 

(*  Now  apply  OWA  *) 
AuxR  :=  0; 

for  i  :=  1  to  NumCriteria  do 
AuxR  :=  AuxR  +  (Values[i]  *  weights[i]); 

OWAACC    :=  OWAACC  +  AUXR; 

(*  write  more  data  *) 

for  i  :=  1  to  NumCriteria  do 
begin 

Rankvector2[RankVectorl[i]]  :=  i; 

Rank2Acc[RankVectorl[i]]  :=  Rank2Acc[RankVectorl[i]]  +  i; 
end; 

if  ((Counter  mod  Reportlnterval )  =  1)  then 
begi  n 

write(OutFile, Counter) ; 


for  i  :=  1  to  NumCriteria  do 
begi  n 

if  (Counter  <>  1)  then 

auxr2  :=  valueAcc[i]  /  Reportlnterval 
el  se 

auxr2  :=  ValueAcc[i]  ; 
write (OutFile, 1 , 1 ,AuxR2:l:5) ; 
ValueAcc[i]  :=  0; 
end; 

for  i   :=  1  to  NumCriteria  do 
begi  n 

if  (Counter  <>  1)  then 

AuxR2  :=  Rank2Acc[i]  /  Reportlnterval 
el  se 

AuxR2  :=  Rank2Acc[i]; 
write(OutFile,  ' ,  1  ,AuxR2:l:5); 
Rank2Acc[i]  :=  0; 
end; 

if  (Counter  <>  1)  then 

AuxR2  :=  OWAAcc  /  Reportlnterval 
el  se 

AuxR2  :=  OWAAcc; 
writeln (out File, ' , 1  ,AuxR2:l:  5)  ; 
OWAAcc  :=  0; 

end; 

Evaluate  :=  AuxR; 
end; 

(*   


end. 
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